sort by:
Revision Author Date Message Commit Date
6d8c01b bcm2708: alsa sound driver Signed-off-by: popcornmix <popcornmix@gmail.com> alsa: add mmap support and some cleanups to bcm2835 ALSA driver snd-bcm2835: Add support for spdif/hdmi passthrough This adds a dedicated subdevice which can be used for passthrough of non-audio formats (ie encoded a52) through the hdmi audio link. In addition to this driver extension an appropriate card config is required to make alsa-lib support the AES parameters for this device. snd-bcm2708: Add mutex, improve logging Fix for ALSA driver crash Avoids an issue when closing and opening vchiq where a message can arrive before service handle has been written alsa: reduce severity of expected warning message snd-bcm2708: Fix dmesg spam for non-error case alsa: Ensure mutexes are released through error paths alsa: Make interrupted close paths quieter BCM270x: Add onboard sound device to Device Tree Add Device Tree support to alsa driver. Add device to Device Tree. Don't add platform devices when booting in DT mode. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> bcm2835: access controls under the audio mutex I don't think the ALSA framework provides any kind of automatic synchronization within the control callbacks. We most likely need to ensure this manually, so add locking around all access to shared mutable data. In particular, bcm2835_audio_set_ctls() should probably always be called under our own audio lock. snd-bcm2835: Don't allow responses from VC to be interrupted by user signals There should always be a response, and retry after a signal interruption is not handled, so don't report we are interruptible. See: https://github.com/raspberrypi/linux/issues/1560 snd-bcm2835: Use bcm2835_hw params in preallocate 08 January 2017, 11:39:33 UTC
d1a929d cma: Add vc_cma driver to enable use of CMA Signed-off-by: popcornmix <popcornmix@gmail.com> vc_cma: Make the vc_cma area the default contiguous DMA area vc_cma: Provide empty functions when module is not built Providing empty functions saves the users from guarding the function call with an #if clause. Move __init markings from prototypes to functions. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> 08 January 2017, 11:39:32 UTC
e6d6442 mmc: Add MMC_QUIRK_ERASE_BROKEN for some cards Some SD cards have been found that corrupt data when small blocks are erased. Add a quirk to indicate that ERASE should not be used, and set it for cards of that type. Signed-off-by: Phil Elwell <phil@raspberrypi.org> mmc: Apply QUIRK_BROKEN_ERASE to other capacities Signed-off-by: Phil Elwell <phil@raspberrypi.org> mmc: Add card_quirks module parameter, log quirks Use mmc_block.card_quirks to override the quirks for all SD or MMC cards. The value is a bitfield using the bit positions defined in include/linux/mmc/card.h. If the module parameter is placed in the kernel command line (or bootargs) stored on the card then, assuming the device only has one SD card interface, the override effectively becomes card-specific. Signed-off-by: Phil Elwell <phil@raspberrypi.org> 08 January 2017, 11:39:31 UTC
08fcc2f Adding bcm2835-sdhost driver, and an overlay to enable it BCM2835 has two SD card interfaces. This driver uses the other one. bcm2835-sdhost: Error handling fix, and code clarification bcm2835-sdhost: Adding overclocking option Allow a different clock speed to be substitued for a requested 50MHz. This option is exposed using the "overclock_50" DT parameter. Note that the sdhost interface is restricted to integer divisions of core_freq, and the highest sensible option for a core_freq of 250MHz is 84 (250/3 = 83.3MHz), the next being 125 (250/2) which is much too high. Use at your own risk. bcm2835-sdhost: Round up the overclock, so 62 works for 62.5Mhz Also only warn once for each overclock setting. bcm2835-sdhost: Improve error handling and recovery 1) Expose the hw_reset method to the MMC framework, removing many internal calls by the driver. 2) Reduce overclock setting on error. 3) Increase timeout to cope with high capacity cards. 4) Add properties and parameters to control pio_limit and debug. 5) Reduce messages at probe time. bcm2835-sdhost: Further improve overclock back-off bcm2835-sdhost: Clear HBLC for PIO mode Also update pio_limit default in overlay README. bcm2835-sdhost: Add the ERASE capability See: https://github.com/raspberrypi/linux/issues/1076 bcm2835-sdhost: Ignore CRC7 for MMC CMD1 It seems that the sdhost interface returns CRC7 errors for CMD1, which is the MMC-specific SEND_OP_COND. Returning these errors to the MMC layer causes a downward spiral, but ignoring them seems to be harmless. bcm2835-mmc/sdhost: Remove ARCH_BCM2835 differences The bcm2835-mmc driver (and -sdhost driver that copied from it) contains code to handle SDIO interrupts in a threaded interrupt handler rather than waking the MMC framework thread. The change follows a patch from Russell King that adds the facility as the preferred way of working. However, the new code path is only present in ARCH_BCM2835 builds, which I have taken to be a way of testing the waters rather than making the change across the board; I can't see any technical reason why it wouldn't be enabled for MACH_BCM270X builds. So this patch standardises on the ARCH_BCM2835 code, removing the old code paths. bcm2835-sdhost: Don't log timeout errors unless debug=1 The MMC card-discovery process generates timeouts. This is expected behaviour, so reporting it to the user serves no purpose. Suppress the reporting of timeout errors unless the debug flag is on. bcm2835-sdhost: Add workaround for odd behaviour on some cards For reasons not understood, the sdhost driver fails when reading sectors very near the end of some SD cards. The problem could be related to the similar issue that reading the final sector of any card as part of a multiple read never completes, and the workaround is an extension of the mechanism introduced to solve that problem which ensures those sectors are always read singly. bcm2835-sdhost: Major revision This is a significant revision of the bcm2835-sdhost driver. It improves on the original in a number of ways: 1) Through the use of CMD23 for reads it appears to avoid problems reading some sectors on certain high speed cards. 2) Better atomicity to prevent crashes. 3) Higher performance. 4) Activity logging included, for easier diagnosis in the event of a problem. Signed-off-by: Phil Elwell <phil@raspberrypi.org> bcm2835-sdhost: Restore ATOMIC flag to PIO sg mapping Allocation problems have been seen in a wireless driver, and this is the only change which might have been responsible. SQUASH: bcm2835-sdhost: Only claim one DMA channel With both MMC controllers enabled there are few DMA channels left. The bcm2835-sdhost driver only uses DMA in one direction at a time, so it doesn't need to claim two channels. See: https://github.com/raspberrypi/linux/issues/1327 Signed-off-by: Phil Elwell <phil@raspberrypi.org> bcm2835-sdhost: Workaround for "slow" sectors Some cards have been seen to cause timeouts after certain sectors are read. This workaround enforces a minimum delay between the stop after reading one of those sectors and a subsequent data command. Using CMD23 (SET_BLOCK_COUNT) avoids this problem, so good cards will not be penalised by this workaround. Signed-off-by: Phil Elwell <phil@raspberrypi.org> bcm2835-sdhost: Firmware manages the clock divisor The bcm2835-sdhost driver hands control of the CDIV clock divisor register to matching firmware, allowing it to adjust to a changing core clock. This removes the need to use the performance governor or to enable io_is_busy on the on-demand governor in order to get the best SD performance. N.B. As SD clocks must be an integer divisor of the core clock, it is possible that the SD clock for "turbo" mode can be different (even lower) than "normal" mode. Signed-off-by: Phil Elwell <phil@raspberrypi.org> bcm2835-sdhost: Reset the clock in task context Since reprogramming the clock can now involve a round-trip to the firmware it must not be done at atomic context, and a tasklet is not a task. Signed-off-by: Phil Elwell <phil@raspberrypi.org> bcm2835-sdhost: Don't exit cmd wait loop on error The FAIL flag can be set in the CMD register before command processing is complete, leading to spurious "failed to complete" errors. This has the effect of promoting harmless CRC7 errors during CMD1 processing into errors that can delay and even prevent booting. Also: 1) Convert the last KERN_ERROR message in the register dumping to KERN_INFO. 2) Remove an unnecessary reset call from bcm2835_sdhost_add_host. See: https://github.com/raspberrypi/linux/pull/1492 Signed-off-by: Phil Elwell <phil@raspberrypi.org> 08 January 2017, 11:39:30 UTC
70a6c60 MMC: added alternative MMC driver mmc: Disable CMD23 transfers on all cards Pending wire-level investigation of these types of transfers and associated errors on bcm2835-mmc, disable for now. Fallback of CMD18/CMD25 transfers will be used automatically by the MMC layer. Reported/Tested-by: Gellert Weisz <gellert@raspberrypi.org> mmc: bcm2835-mmc: enable DT support for all architectures Both ARCH_BCM2835 and ARCH_BCM270x are built with OF now. Enable Device Tree support for all architectures. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> mmc: bcm2835-mmc: fix probe error handling Probe error handling is broken in several places. Simplify error handling by using device managed functions. Replace pr_{err,info} with dev_{err,info}. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> bcm2835-mmc: Add locks when accessing sdhost registers bcm2835-mmc: Add range of debug options for slowing things down bcm2835-mmc: Add option to disable some delays bcm2835-mmc: Add option to disable MMC_QUIRK_BLK_NO_CMD23 bcm2835-mmc: Default to disabling MMC_QUIRK_BLK_NO_CMD23 bcm2835-mmc: Adding overclocking option Allow a different clock speed to be substitued for a requested 50MHz. This option is exposed using the "overclock_50" DT parameter. Note that the mmc interface is restricted to EVEN integer divisions of 250MHz, and the highest sensible option is 63 (250/4 = 62.5), the next being 125 (250/2) which is much too high. Use at your own risk. bcm2835-mmc: Round up the overclock, so 62 works for 62.5Mhz Also only warn once for each overclock setting. mmc: bcm2835-mmc: Make available on ARCH_BCM2835 Make the bcm2835-mmc driver available for use on ARCH_BCM2835. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> BCM270x_DT: add bcm2835-mmc entry Add Device Tree entry for bcm2835-mmc. In non-DT mode, don't add the device in the board file. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> bcm2835-mmc: Don't overwrite MMC capabilities from DT bcm2835-mmc: Don't override bus width capabilities from devicetree Take out the force setting of the MMC_CAP_4_BIT_DATA host capability so that the result read from devicetree via mmc_of_parse() is preserved. bcm2835-mmc: Only claim one DMA channel With both MMC controllers enabled there are few DMA channels left. The bcm2835-mmc driver only uses DMA in one direction at a time, so it doesn't need to claim two channels. See: https://github.com/raspberrypi/linux/issues/1327 Signed-off-by: Phil Elwell <phil@raspberrypi.org> 08 January 2017, 11:39:29 UTC
6495203 dmaengine: Add support for BCM2708 Add support for DMA controller of BCM2708 as used in the Raspberry Pi. Currently it only supports cyclic DMA. Signed-off-by: Florian Meier <florian.meier@koalo.de> dmaengine: expand functionality by supporting scatter/gather transfers sdhci-bcm2708 and dma.c: fix for LITE channels DMA: fix cyclic LITE length overflow bug dmaengine: bcm2708: Remove chancnt affectations Mirror bcm2835-dma.c commit 9eba5536a7434c69d8c185d4bd1c70734d92287d: chancnt is already filled by dma_async_device_register, which uses the channel list to know how much channels there is. Since it's already filled, we can safely remove it from the drivers' probe function. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> dmaengine: bcm2708: overwrite dreq only if it is not set dreq is set when the DMA channel is fetched from Device Tree. slave_id is set using dmaengine_slave_config(). Only overwrite dreq with slave_id if it is not set. dreq/slave_id in the cyclic DMA case is not touched, because I don't have hardware to test with. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> dmaengine: bcm2708: do device registration in the board file Don't register the device in the driver. Do it in the board file. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> dmaengine: bcm2708: don't restrict DT support to ARCH_BCM2835 Both ARCH_BCM2835 and ARCH_BCM270x are built with OF now. Add Device Tree support to the non ARCH_BCM2835 case. Use the same driver name regardless of architecture. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> BCM270x_DT: add bcm2835-dma entry Add Device Tree entry for bcm2835-dma. The entry doesn't contain any resources since they are handled by the arch/arm/mach-bcm270x/dma.c driver. In non-DT mode, don't add the device in the board file. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> bcm2708-dmaengine: Add debug options BCM270x: Add memory and irq resources to dmaengine device and DT Prepare for merging of the legacy DMA API arch driver dma.c with bcm2708-dmaengine by adding memory and irq resources both to platform file device and Device Tree node. Don't use BCM_DMAMAN_DRIVER_NAME so we don't have to include mach/dma.h Signed-off-by: Noralf Trønnes <noralf@tronnes.org> dmaengine: bcm2708: Merge with arch dma.c driver and disable dma.c Merge the legacy DMA API driver with bcm2708-dmaengine. This is done so we can use bcm2708_fb on ARCH_BCM2835 (mailbox driver is also needed). Changes to the dma.c code: - Use BIT() macro. - Cutdown some comments to one line. - Add mutex to vc_dmaman and use this, since the dev lock is locked during probing of the engine part. - Add global g_dmaman variable since drvdata is used by the engine part. - Restructure for readability: vc_dmaman_chan_alloc() vc_dmaman_chan_free() bcm_dma_chan_free() - Restructure bcm_dma_chan_alloc() to simplify error handling. - Use device irq resources instead of hardcoded bcm_dma_irqs table. - Remove dev_dmaman_register() and code it directly. - Remove dev_dmaman_deregister() and code it directly. - Simplify bcm_dmaman_probe() using devm_* functions. - Get dmachans from DT if available. - Keep 'dma.dmachans' module argument name for backwards compatibility. Make it available on ARCH_BCM2835 as well. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> dmaengine: bcm2708: set residue_granularity field bcm2708-dmaengine supports residue reporting at burst level but didn't report this via the residue_granularity field. Without this field set properly we get playback issues with I2S cards. dmaengine: bcm2708-dmaengine: Fix memory leak when stopping a running transfer bcm2708-dmaengine: Use more DMA channels (but not 12) 1) Only the bcm2708_fb drivers uses the legacy DMA API, and it requires a BULK-capable channel, so all other types (FAST, NORMAL and LITE) can be made available to the regular DMA API. 2) DMA channels 11-14 share an interrupt. The driver can't handle this, so don't use channels 12-14 (12 was used, probably because it appears to have an interrupt, but in reality that interrupt is for activity on ANY channel). This may explain a lockup encountered when running out of DMA channels. The combined effect of this patch is to leave 7 DMA channels available + channel 0 for bcm2708_fb via the legacy API. See: https://github.com/raspberrypi/linux/issues/1110 https://github.com/raspberrypi/linux/issues/1108 dmaengine: bcm2708: Make legacy API available for bcm2835-dma bcm2708_fb uses the legacy DMA API, so in order to start using bcm2835-dma, bcm2835-dma has to support the legacy API. Make this possible by exporting bcm_dmaman_probe() and bcm_dmaman_remove(). Signed-off-by: Noralf Trønnes <noralf@tronnes.org> dmaengine: bcm2708: Change DT compatible string Both bcm2835-dma and bcm2708-dmaengine have the same compatible string. So change compatible to "brcm,bcm2708-dma". Signed-off-by: Noralf Trønnes <noralf@tronnes.org> dmaengine: bcm2708: Remove driver but keep legacy API Dropping non-DT support means we don't need this driver, but we still need the legacy DMA API. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> bcm2708-dmaengine - Fix arm64 portability/build issues 08 January 2017, 11:39:29 UTC
67d7cf8 bcm2708 framebuffer driver Signed-off-by: popcornmix <popcornmix@gmail.com> bcm2708_fb : Implement blanking support using the mailbox property interface bcm2708_fb: Add pan and vsync controls bcm2708_fb: DMA acceleration for fb_copyarea Based on http://www.raspberrypi.org/phpBB3/viewtopic.php?p=62425#p62425 Also used Simon's dmaer_master module as a reference for tweaking DMA settings for better performance. For now busylooping only. IRQ support might be added later. With non-overclocked Raspberry Pi, the performance is ~360 MB/s for simple copy or ~260 MB/s for two-pass copy (used when dragging windows to the right). In the case of using DMA channel 0, the performance improves to ~440 MB/s. For comparison, VFP optimized CPU copy can only do ~114 MB/s in the same conditions (hindered by reading uncached source buffer). Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com> bcm2708_fb: report number of dma copies Add a counter (exported via debugfs) reporting the number of dma copies that the framebuffer driver has done, in order to help evaluate different optimization strategies. Signed-off-by: Luke Diamand <luked@broadcom.com> bcm2708_fb: use IRQ for DMA copies The copyarea ioctl() uses DMA to speed things along. This was busy-waiting for completion. This change supports using an interrupt instead for larger transfers. For small transfers, busy-waiting is still likely to be faster. Signed-off-by: Luke Diamand <luke@diamand.org> bcm2708: Make ioctl logging quieter video: fbdev: bcm2708_fb: Don't panic on error No need to panic the kernel if the video driver fails. Just print a message and return an error. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> fbdev: bcm2708_fb: Add ARCH_BCM2835 support Add Device Tree support. Pass the device to dma_alloc_coherent() in order to get the correct bus address on ARCH_BCM2835. Use the new DMA legacy API header file. Including <mach/platform.h> is not necessary. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> BCM270x_DT: Add bcm2708-fb device Add bcm2708-fb to Device Tree and don't add the platform device when booting in DT mode. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> 08 January 2017, 11:39:28 UTC
11691e4 Add dwc_otg driver Signed-off-by: popcornmix <popcornmix@gmail.com> usb: dwc: fix lockdep false positive Signed-off-by: Kari Suvanto <karis79@gmail.com> usb: dwc: fix inconsistent lock state Signed-off-by: Kari Suvanto <karis79@gmail.com> Add FIQ patch to dwc_otg driver. Enable with dwc_otg.fiq_fix_enable=1. Should give about 10% more ARM performance. Thanks to Gordon and Costas Avoid dynamic memory allocation for channel lock in USB driver. Thanks ddv2005. Add NAK holdoff scheme. Enabled by default, disable with dwc_otg.nak_holdoff_enable=0. Thanks gsh Make sure we wait for the reset to finish dwc_otg: fix bug in dwc_otg_hcd.c resulting in silent kernel memory corruption, escalating to OOPS under high USB load. dwc_otg: Fix unsafe access of QTD during URB enqueue In dwc_otg_hcd_urb_enqueue during qtd creation, it was possible that the transaction could complete almost immediately after the qtd was assigned to a host channel during URB enqueue, which meant the qtd pointer was no longer valid having been completed and removed. Usually, this resulted in an OOPS during URB submission. By predetermining whether transactions need to be queued or not, this unsafe pointer access is avoided. This bug was only evident on the Pi model A where a device was attached that had no periodic endpoints (e.g. USB pendrive or some wlan devices). dwc_otg: Fix incorrect URB allocation error handling If the memory allocation for a dwc_otg_urb failed, the kernel would OOPS because for some reason a member of the *unallocated* struct was set to zero. Error handling changed to fail correctly. dwc_otg: fix potential use-after-free case in interrupt handler If a transaction had previously aborted, certain interrupts are enabled to track error counts and reset where necessary. On IN endpoints the host generates an ACK interrupt near-simultaneously with completion of transfer. In the case where this transfer had previously had an error, this results in a use-after-free on the QTD memory space with a 1-byte length being overwritten to 0x00. dwc_otg: add handling of SPLIT transaction data toggle errors Previously a data toggle error on packets from a USB1.1 device behind a TT would result in the Pi locking up as the driver never handled the associated interrupt. Patch adds basic retry mechanism and interrupt acknowledgement to cater for either a chance toggle error or for devices that have a broken initial toggle state (FT8U232/FT232BM). dwc_otg: implement tasklet for returning URBs to usbcore hcd layer The dwc_otg driver interrupt handler for transfer completion will spend a very long time with interrupts disabled when a URB is completed - this is because usb_hcd_giveback_urb is called from within the handler which for a USB device driver with complicated processing (e.g. webcam) will take an exorbitant amount of time to complete. This results in missed completion interrupts for other USB packets which lead to them being dropped due to microframe overruns. This patch splits returning the URB to the usb hcd layer into a high-priority tasklet. This will have most benefit for isochronous IN transfers but will also have incidental benefit where multiple periodic devices are active at once. dwc_otg: fix NAK holdoff and allow on split transactions only This corrects a bug where if a single active non-periodic endpoint had at least one transaction in its qh, on frnum == MAX_FRNUM the qh would get skipped and never get queued again. This would result in a silent device until error detection (automatic or otherwise) would either reset the device or flush and requeue the URBs. Additionally the NAK holdoff was enabled for all transactions - this would potentially stall a HS endpoint for 1ms if a previous error state enabled this interrupt and the next response was a NAK. Fix so that only split transactions get held off. dwc_otg: Call usb_hcd_unlink_urb_from_ep with lock held in completion handler usb_hcd_unlink_urb_from_ep must be called with the HCD lock held. Calling it asynchronously in the tasklet was not safe (regression in c4564d4a1a0a9b10d4419e48239f5d99e88d2667). This change unlinks it from the endpoint prior to queueing it for handling in the tasklet, and also adds a check to ensure the urb is OK to be unlinked before doing so. NULL pointer dereference kernel oopses had been observed in usb_hcd_giveback_urb when a USB device was unplugged/replugged during data transfer. This effect was reproduced using automated USB port power control, hundreds of replug events were performed during active transfers to confirm that the problem was eliminated. USB fix using a FIQ to implement split transactions This commit adds a FIQ implementaion that schedules the split transactions using a FIQ so we don't get held off by the interrupt latency of Linux dwc_otg: fix device attributes and avoid kernel warnings on boot dcw_otg: avoid logging function that can cause panics See: https://github.com/raspberrypi/firmware/issues/21 Thanks to cleverca22 for fix dwc_otg: mask correct interrupts after transaction error recovery The dwc_otg driver will unmask certain interrupts on a transaction that previously halted in the error state in order to reset the QTD error count. The various fine-grained interrupt handlers do not consider that other interrupts besides themselves were unmasked. By disabling the two other interrupts only ever enabled in DMA mode for this purpose, we can avoid unnecessary function calls in the IRQ handler. This will also prevent an unneccesary FIQ interrupt from being generated if the FIQ is enabled. dwc_otg: fiq: prevent FIQ thrash and incorrect state passing to IRQ In the case of a transaction to a device that had previously aborted due to an error, several interrupts are enabled to reset the error count when a device responds. This has the side-effect of making the FIQ thrash because the hardware will generate multiple instances of a NAK on an IN bulk/interrupt endpoint and multiple instances of ACK on an OUT bulk/interrupt endpoint. Make the FIQ mask and clear the associated interrupts. Additionally, on non-split transactions make sure that only unmasked interrupts are cleared. This caused a hard-to-trigger but serious race condition when you had the combination of an endpoint awaiting error recovery and a transaction completed on an endpoint - due to the sequencing and timing of interrupts generated by the dwc_otg core, it was possible to confuse the IRQ handler. Fix function tracing dwc_otg: whitespace cleanup in dwc_otg_urb_enqueue dwc_otg: prevent OOPSes during device disconnects The dwc_otg_urb_enqueue function is thread-unsafe. In particular the access of urb->hcpriv, usb_hcd_link_urb_to_ep, dwc_otg_urb->qtd and friends does not occur within a critical section and so if a device was unplugged during activity there was a high chance that the usbcore hub_thread would try to disable the endpoint with partially- formed entries in the URB queue. This would result in BUG() or null pointer dereferences. Fix so that access of urb->hcpriv, enqueuing to the hardware and adding to usbcore endpoint URB lists is contained within a single critical section. dwc_otg: prevent BUG() in TT allocation if hub address is > 16 A fixed-size array is used to track TT allocation. This was previously set to 16 which caused a crash because dwc_otg_hcd_allocate_port would read past the end of the array. This was hit if a hub was plugged in which enumerated as addr > 16, due to previous device resets or unplugs. Also add #ifdef FIQ_DEBUG around hcd->hub_port_alloc[], which grows to a large size if 128 hub addresses are supported. This field is for debug only for tracking which frame an allocate happened in. dwc_otg: make channel halts with unknown state less damaging If the IRQ received a channel halt interrupt through the FIQ with no other bits set, the IRQ would not release the host channel and never complete the URB. Add catchall handling to treat as a transaction error and retry. dwc_otg: fiq_split: use TTs with more granularity This fixes certain issues with split transaction scheduling. - Isochronous multi-packet OUT transactions now hog the TT until they are completed - this prevents hubs aborting transactions if they get a periodic start-split out-of-order - Don't perform TT allocation on non-periodic endpoints - this allows simultaneous use of the TT's bulk/control and periodic transaction buffers This commit will mainly affect USB audio playback. dwc_otg: fix potential sleep while atomic during urb enqueue Fixes a regression introduced with eb1b482a. Kmalloc called from dwc_otg_hcd_qtd_add / dwc_otg_hcd_qtd_create did not always have the GPF_ATOMIC flag set. Force this flag when inside the larger critical section. dwc_otg: make fiq_split_enable imply fiq_fix_enable Failing to set up the FIQ correctly would result in "IRQ 32: nobody cared" errors in dmesg. dwc_otg: prevent crashes on host port disconnects Fix several issues resulting in crashes or inconsistent state if a Model A root port was disconnected. - Clean up queue heads properly in kill_urbs_in_qh_list by removing the empty QHs from the schedule lists - Set the halt status properly to prevent IRQ handlers from using freed memory - Add fiq_split related cleanup for saved registers - Make microframe scheduling reclaim host channels if active during a disconnect - Abort URBs with -ESHUTDOWN status response, informing device drivers so they respond in a more correct fashion and don't try to resubmit URBs - Prevent IRQ handlers from attempting to handle channel interrupts if the associated URB was dequeued (and the driver state was cleared) dwc_otg: prevent leaking URBs during enqueue A dwc_otg_urb would get leaked if the HCD enqueue function failed for any reason. Free the URB at the appropriate points. dwc_otg: Enable NAK holdoff for control split transactions Certain low-speed devices take a very long time to complete a data or status stage of a control transaction, producing NAK responses until they complete internal processing - the USB2.0 spec limit is up to 500mS. This causes the same type of interrupt storm as seen with USB-serial dongles prior to c8edb238. In certain circumstances, usually while booting, this interrupt storm could cause SD card timeouts. dwc_otg: Fix for occasional lockup on boot when doing a USB reset dwc_otg: Don't issue traffic to LS devices in FS mode Issuing low-speed packets when the root port is in full-speed mode causes the root port to stop responding. Explicitly fail when enqueuing URBs to a LS endpoint on a FS bus. Fix ARM architecture issue with local_irq_restore() If local_fiq_enable() is called before a local_irq_restore(flags) where the flags variable has the F bit set, the FIQ will be erroneously disabled. Fixup arch_local_irq_restore to avoid trampling the F bit in CPSR. Also fix some of the hacks previously implemented for previous dwc_otg incarnations. dwc_otg: fiq_fsm: Base commit for driver rewrite This commit removes the previous FIQ fixes entirely and adds fiq_fsm. This rewrite features much more complete support for split transactions and takes into account several OTG hardware bugs. High-speed isochronous transactions are also capable of being performed by fiq_fsm. All driver options have been removed and replaced with: - dwc_otg.fiq_enable (bool) - dwc_otg.fiq_fsm_enable (bool) - dwc_otg.fiq_fsm_mask (bitmask) - dwc_otg.nak_holdoff (unsigned int) Defaults are specified such that fiq_fsm behaves similarly to the previously implemented FIQ fixes. fiq_fsm: Push error recovery into the FIQ when fiq_fsm is used If the transfer associated with a QTD failed due to a bus error, the HCD would retry the transfer up to 3 times (implementing the USB2.0 three-strikes retry in software). Due to the masking mechanism used by fiq_fsm, it is only possible to pass a single interrupt through to the HCD per-transfer. In this instance host channels would fall off the radar because the error reset would function, but the subsequent channel halt would be lost. Push the error count reset into the FIQ handler. fiq_fsm: Implement timeout mechanism For full-speed endpoints with a large packet size, interrupt latency runs the risk of the FIQ starting a transaction too late in a full-speed frame. If the device is still transmitting data when EOF2 for the downstream frame occurs, the hub will disable the port. This change is not reflected in the hub status endpoint and the device becomes unresponsive. Prevent high-bandwidth transactions from being started too late in a frame. The mechanism is not guaranteed: a combination of bit stuffing and hub latency may still result in a device overrunning. fiq_fsm: fix bounce buffer utilisation for Isochronous OUT Multi-packet isochronous OUT transactions were subject to a few bounday bugs. Fix them. Audio playback is now much more robust: however, an issue stands with devices that have adaptive sinks - ALSA plays samples too fast. dwc_otg: Return full-speed frame numbers in HS mode The frame counter increments on every *microframe* in high-speed mode. Most device drivers expect this number to be in full-speed frames - this caused considerable confusion to e.g. snd_usb_audio which uses the frame counter to estimate the number of samples played. fiq_fsm: save PID on completion of interrupt OUT transfers Also add edge case handling for interrupt transports. Note that for periodic split IN, data toggles are unimplemented in the OTG host hardware - it unconditionally accepts any PID. fiq_fsm: add missing case for fiq_fsm_tt_in_use() Certain combinations of bitrate and endpoint activity could result in a periodic transaction erroneously getting started while the previous Isochronous OUT was still active. fiq_fsm: clear hcintmsk for aborted transactions Prevents the FIQ from erroneously handling interrupts on a timed out channel. fiq_fsm: enable by default fiq_fsm: fix dequeues for non-periodic split transactions If a dequeue happened between the SSPLIT and CSPLIT phases of the transaction, the HCD would never receive an interrupt. fiq_fsm: Disable by default fiq_fsm: Handle HC babble errors The HCTSIZ transfer size field raises a babble interrupt if the counter wraps. Handle the resulting interrupt in this case. dwc_otg: fix interrupt registration for fiq_enable=0 Additionally make the module parameter conditional for wherever hcd->fiq_state is touched. fiq_fsm: Enable by default dwc_otg: Fix various issues with root port and transaction errors Process the host port interrupts correctly (and don't trample them). Root port hotplug now functional again. Fix a few thinkos with the transaction error passthrough for fiq_fsm. fiq_fsm: Implement hack for Split Interrupt transactions Hubs aren't too picky about which endpoint we send Control type split transactions to. By treating Interrupt transfers as Control, it is possible to use the non-periodic queue in the OTG core as well as the non-periodic FIFOs in the hub itself. This massively reduces the microframe exclusivity/contention that periodic split transactions otherwise have to enforce. It goes without saying that this is a fairly egregious USB specification violation, but it works. Original idea by Hans Petter Selasky @ FreeBSD.org. dwc_otg: FIQ support on SMP. Set up FIQ stack and handler on Core 0 only. dwc_otg: introduce fiq_fsm_spin(un|)lock() SMP safety for the FIQ relies on register read-modify write cycles being completed in the correct order. Several places in the DWC code modify registers also touched by the FIQ. Protect these by a bare-bones lock mechanism. This also makes it possible to run the FIQ and IRQ handlers on different cores. fiq_fsm: fix build on bcm2708 and bcm2709 platforms dwc_otg: put some barriers back where they should be for UP bcm2709/dwc_otg: Setup FIQ on core 1 if >1 core active dwc_otg: fixup read-modify-write in critical paths Be more careful about read-modify-write on registers that the FIQ also touches. Guard fiq_fsm_spin_lock with fiq_enable check fiq_fsm: Falling out of the state machine isn't fatal This edge case can be hit if the port is disabled while the FIQ is in the middle of a transaction. Make the effects less severe. Also get rid of the useless return value. squash: dwc_otg: Allow to build without SMP usb: core: make overcurrent messages more prominent Hub overcurrent messages are more serious than "debug". Increase loglevel. usb: dwc_otg: Don't use dma_to_virt() Commit 6ce0d20 changes dma_to_virt() which breaks this driver. Open code the old dma_to_virt() implementation to work around this. Limit the use of __bus_to_virt() to cases where transfer_buffer_length is set and transfer_buffer is not set. This is done to increase the chance that this driver will also work on ARCH_BCM2835. transfer_buffer should not be NULL if the length is set, but the comment in the code indicates that there are situations where this might happen. drivers/usb/isp1760/isp1760-hcd.c also has a similar comment pointing to a possible: 'usb storage / SCSI bug'. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> dwc_otg: Fix crash when fiq_enable=0 dwc_otg: fiq_fsm: Make high-speed isochronous strided transfers work properly Certain low-bandwidth high-speed USB devices (specialist audio devices, compressed-frame webcams) have packet intervals > 1 microframe. Stride these transfers in the FIQ by using the start-of-frame interrupt to restart the channel at the right time. dwc_otg: Force host mode to fix incorrect compute module boards dwc_otg: Add ARCH_BCM2835 support Signed-off-by: Noralf Trønnes <noralf@tronnes.org> dwc_otg: Simplify FIQ irq number code Dropping ATAGS means we can simplify the FIQ irq number code. Also add error checking on the returned irq number. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> dwc_otg: Remove duplicate gadget probe/unregister function dwc_otg: Properly set the HFIR Douglas Anderson reported: According to the most up to date version of the dwc2 databook, the FRINT field of the HFIR register should be programmed to: * 125 us * (PHY clock freq for HS) - 1 * 1000 us * (PHY clock freq for FS/LS) - 1 This is opposed to older versions of the doc that claimed it should be: * 125 us * (PHY clock freq for HS) * 1000 us * (PHY clock freq for FS/LS) and reported lower timing jitter on a USB analyser dcw_otg: trim xfer length when buffer larger than allocated size is received dwc_otg: Don't free qh align buffers in atomic context dwc_otg: Enable the hack for Split Interrupt transactions by default dwc_otg.fiq_fsm_mask=0xF has long been a suggestion for users with audio stutters or other USB bandwidth issues. So far we are aware of many success stories but no failure caused by this setting. Make it a default to learn more. See: https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=70437 Signed-off-by: popcornmix <popcornmix@gmail.com> dwc_otg: Use kzalloc when suitable 08 January 2017, 11:39:26 UTC
b443284 squash: include ARCH_BCM2708 / ARCH_BCM2709 08 January 2017, 11:39:25 UTC
202712d reboot: Use power off rather than busy spinning when halt is requested 08 January 2017, 11:39:24 UTC
707b83e Main bcm2708/bcm2709 linux port Signed-off-by: popcornmix <popcornmix@gmail.com> Signed-off-by: Noralf Trønnes <noralf@tronnes.org> 08 January 2017, 11:39:24 UTC
f9397b1 ARM: bcm2708: Enable building power domain driver. Signed-off-by: Eric Anholt <eric@anholt.net> 08 January 2017, 11:39:23 UTC
b4d02ed rtc: Add SPI alias for pcf2123 driver Without this alias, Device Tree won't cause the driver to be loaded. See: https://github.com/raspberrypi/linux/pull/1510 08 January 2017, 11:39:22 UTC
d94cf3e Enable upstream BCM2835 auxiliar mini UART support 08 January 2017, 11:39:21 UTC
a73a9c9 clk: bcm2835: Mark the CM SDRAM clock's parent as critical While the SDRAM is being driven by its dedicated PLL most of the time, there is a little loop running in the firmware that periodically turns on the CM SDRAM clock (using its pre-initialized parent) and switches SDRAM to using the CM clock to do PVT recalibration. This avoids system hangs if we choose SDRAM's parent for some other clock, then disable that clock. Signed-off-by: Eric Anholt <eric@anholt.net> 08 January 2017, 11:39:20 UTC
dac6640 clk: bcm2835: Mark GPIO clocks enabled at boot as critical. These divide off of PLLD_PER and are used for the ethernet and wifi PHYs source PLLs. Neither of them is currently represented by a phy device that would grab the clock for us. This keeps other drivers from killing the networking PHYs when they disable their own clocks and trigger PLLD_PER's refcount going to 0. v2: Skip marking as critical if they aren't on at boot. Signed-off-by: Eric Anholt <eric@anholt.net> 08 January 2017, 11:39:19 UTC
54ed0c4 clk: bcm2835: Mark the VPU clock as critical The VPU clock is also the clock for our AXI bus, so we really can't disable it. This might have happened during boot if, for example, uart1 (aux_uart clock) probed and was then disabled before the other consumers of the VPU clock had probed. v2: Rewrite to use a .flags in bcm2835_clock_data, since other clocks will need this too. Signed-off-by: Eric Anholt <eric@anholt.net> 08 January 2017, 11:39:18 UTC
047bec3 firmware: Updated mailbox header 08 January 2017, 11:39:17 UTC
6e0442e dmaengine: bcm2835: Load driver early and support legacy API Load driver early since at least bcm2708_fb doesn't support deferred probing and even if it did, we don't want the video driver deferred. Support the legacy DMA API which is needed by bcm2708_fb. Don't mask out channel 2. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> 08 January 2017, 11:39:16 UTC
023bb0f ARM: bcm2835: Set Serial number and Revision The VideoCore bootloader passes in Serial number and Revision number through Device Tree. Make these available to userspace through /proc/cpuinfo. Mainline status: There is a commit in linux-next that standardize passing the serial number through Device Tree (string: /serial-number): ARM: 8355/1: arch: Show the serial number from devicetree in cpuinfo There was an attempt to do the same with the revision number, but it didn't get in: [PATCH v2 1/2] arm: devtree: Set system_rev from DT revision Signed-off-by: Noralf Trønnes <noralf@tronnes.org> 08 January 2017, 11:39:15 UTC
5459e41 spi-bcm2835: Disable forced software CS Select software CS in bcm2708_common.dtsi, and disable the automatic conversion in the driver to allow hardware CS to be re-enabled with an overlay. See: https://github.com/raspberrypi/linux/issues/1547 Signed-off-by: Phil Elwell <phil@raspberrypi.org> 08 January 2017, 11:39:14 UTC
c503b99 spi-bcm2835: Support pin groups other than 7-11 The spi-bcm2835 driver automatically uses GPIO chip-selects due to some unreliability of the native ones. In doing so it chooses the same pins as the native chip-selects would use, but the existing code always uses pins 7 and 8, wherever the SPI function is mapped. Search the pinctrl group assigned to the driver for pins that correspond to native chip-selects, and use those for GPIO chip- selects. Signed-off-by: Phil Elwell <phil@raspberrypi.org> 08 January 2017, 11:39:13 UTC
21a0217 pinctrl-bcm2835: Return pins to inputs when freed When dynamically unloading overlays, it is important that freed pins are restored to being inputs to prevent functions from being enabled in multiple places at once. Signed-off-by: Phil Elwell <phil@raspberrypi.org> 08 January 2017, 11:39:13 UTC
e898b00 pinctrl-bcm2835: Only request the interrupts listed in the DTB Although the GPIO controller can generate three interrupts (four counting the common one), the device tree files currently only specify two. In the absence of the third, simply don't register that interrupt (as opposed to registering 0), which has the effect of making it impossible to generate interrupts for GPIOs 46-53 which, since they share pins with the SD card interface, is unlikely to be a problem. 08 January 2017, 11:39:12 UTC
d5b7938 pinctrl-bcm2835: Fix interrupt handling for GPIOs 28-31 and 46-53 Contrary to the documentation, the BCM2835 GPIO controller actually has four interrupt lines - one each for the three IRQ groups and one common. Rather confusingly, the GPIO interrupt groups don't correspond directly with the GPIO control banks. Instead, GPIOs 0-27 generate IRQ GPIO0, 28-45 GPIO1 and 46-53 GPIO2. Awkwardly, the GPIOS for IRQ GPIO1 straddle two 32-entry GPIO banks, so it is cleaner to split out a function to process the interrupts for a single GPIO bank. This bug has only just been observed because GPIOs above 27 can only be accessed on an old Raspberry Pi with the optional P5 header fitted, where the pins are often used for I2S instead. 08 January 2017, 11:39:11 UTC
d2d8a48 pinctrl-bcm2835: Set base to 0 give expected gpio numbering Signed-off-by: Noralf Tronnes <notro@tronnes.org> 08 January 2017, 11:39:10 UTC
a4e6d29 serial: 8250: Don't crash when nr_uarts is 0 08 January 2017, 11:39:09 UTC
af3c848 spidev: Add "spidev" compatible string to silence warning See: https://github.com/raspberrypi/linux/issues/1054 08 January 2017, 11:39:08 UTC
c84f419 irqchip: irq-bcm2835: Add 2836 FIQ support Signed-off-by: Noralf Trønnes <noralf@tronnes.org> 08 January 2017, 11:39:07 UTC
a21e7d2 irqchip: bcm2835: Add FIQ support Add a duplicate irq range with an offset on the hwirq's so the driver can detect that enable_fiq() is used. Tested with downstream dwc_otg USB controller driver. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Stephen Warren <swarren@wwwdotorg.org> 08 January 2017, 11:39:06 UTC
dab8f44 irq-bcm2836: Prevent spurious interrupts, and trap them early The old arch-specific IRQ macros included a dsb to ensure the write to clear the mailbox interrupt completed before returning from the interrupt. The BCM2836 irqchip driver needs the same precaution to avoid spurious interrupts. Spurious interrupts are still possible for other reasons, though, so trap them early. 08 January 2017, 11:39:05 UTC
0b4b9f7 BCM2835_DT: Fix I2S register map 08 January 2017, 11:39:04 UTC
897d9e5 mm: Remove the PFN busy warning See commit dae803e165a11bc88ca8dbc07a11077caf97bbcb -- the warning is expected sometimes when using CMA. However, that commit still spams my kernel log with these warnings. Signed-off-by: Eric Anholt <eric@anholt.net> 08 January 2017, 11:39:03 UTC
03befd1 Protect __release_resource against resources without parents Without this patch, removing a device tree overlay can crash here. Signed-off-by: Phil Elwell <phil@raspberrypi.org> 08 January 2017, 11:39:02 UTC
adf3852 serial: Take care starting a hung-up tty's port tty_port_hangup sets a port's tty field to NULL (holding the port lock), but uart_tx_stopped, called from __uart_start (with the port lock), uses the tty field without checking for NULL. Change uart_tx_stopped to treat a NULL tty field as another stopped indication. Signed-off-by: Phil Elwell <phil@raspberrypi.org> 08 January 2017, 11:39:01 UTC
602eb11 smsc95xx: Experimental: Enable turbo_mode and packetsize=2560 by default See: http://forum.kodi.tv/showthread.php?tid=285288 08 January 2017, 11:39:00 UTC
df0b1da Allow mac address to be set in smsc95xx Signed-off-by: popcornmix <popcornmix@gmail.com> 08 January 2017, 11:38:59 UTC
5252c24 add smsc95xx packetsize module_param Signed-off-by: Sam Nazarko <email@samnazarko.co.uk> 08 January 2017, 11:38:58 UTC
ac272c4 smsc95xx: Disable turbo mode by default 08 January 2017, 11:38:57 UTC
1c52b29 smsx95xx: fix crimes against truesize smsc95xx is adjusting truesize when it shouldn't, and following a recent patch from Eric this is now triggering warnings. This patch stops smsc95xx from changing truesize. Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com> 08 January 2017, 11:38:56 UTC
636d553 Revert "SUNRPC: Make NFS swap work with multipath" This reverts commit 15001e5a7e1e207b6bd258cd8f187814cd15b6dc. 08 January 2017, 11:38:55 UTC
c65ed08 Linux 4.8.16 06 January 2017, 10:16:53 UTC
6458972 driver core: fix race between creating/querying glue dir and its cleanup commit cebf8fd16900fdfd58c0028617944f808f97fe50 upstream. The global mutex of 'gdp_mutex' is used to serialize creating/querying glue dir and its cleanup. Turns out it isn't a perfect way because part(kobj_kset_leave()) of the actual cleanup action() is done inside the release handler of the glue dir kobject. That means gdp_mutex has to be held before releasing the last reference count of the glue dir kobject. This patch moves glue dir's cleanup after kobject_del() in device_del() for avoiding the race. Cc: Yijing Wang <wangyijing@huawei.com> Reported-by: Chandra Sekhar Lingutla <clingutla@codeaurora.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:25 UTC
f199bdb Revert "netfilter: move nat hlist_head to nf_conn" This reverts commit 7c9664351980aaa6a4b8837a314360b3a4ad382a as it is not working properly. Please move to 4.9 to get the full fix. Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:25 UTC
99d6d4e Revert "netfilter: nat: convert nat bysrc hash to rhashtable" This reverts commit 870190a9ec9075205c0fa795a09fa931694a3ff1 as it is not working properly. Please move to 4.9 to get the full fix. Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:25 UTC
7742256 arm64: mark reserved memblock regions explicitly in iomem commit e7cd190385d17790cc3eb3821b1094b00aacf325 upstream. Kdump(kexec-tools) parses /proc/iomem to identify all the memory regions on the system. Since the current kernel names "nomap" regions, like UEFI runtime services code/data, as "System RAM," kexec-tools sets up elf core header to include them in a crash dump file (/proc/vmcore). Then crash dump kernel parses UEFI memory map again, re-marks those regions as "nomap" and does not create a memory mapping for them unlike the other areas of System RAM. In this case, copying /proc/vmcore through copy_oldmem_page() on crash dump kernel will end up with a kernel abort, as reported in [1]. This patch names all the "nomap" regions explicitly as "reserved" so that we can exclude them from a crash dump file. acpi_os_ioremap() must also be modified because those regions have WB attributes [2]. Apart from kdump, this change also matches x86's use of acpi (and /proc/iomem). [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/448186.html [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/450089.html Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: James Morse <james.morse@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Matthias Brugger <mbrugger@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:25 UTC
587e89b xfs: set AGI buffer type in xlog_recover_clear_agi_bucket commit 6b10b23ca94451fae153a5cc8d62fd721bec2019 upstream. xlog_recover_clear_agi_bucket didn't set the type to XFS_BLFT_AGI_BUF, so we got a warning during log replay (or an ASSERT on a debug build). XFS (md0): Unknown buffer type 0! XFS (md0): _xfs_buf_ioapply: no ops on block 0xaea8802/0x1 Fix this, as was done in f19b872b for 2 other locations with the same problem. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:25 UTC
959e363 arm/xen: Use alloc_percpu rather than __alloc_percpu commit 24d5373dda7c00a438d26016bce140299fae675e upstream. The function xen_guest_init is using __alloc_percpu with an alignment which are not power of two. However, the percpu allocator never supported alignments which are not power of two and has always behaved incorectly in thise case. Commit 3ca45a4 "percpu: ensure requested alignment is power of two" introduced a check which trigger a warning [1] when booting linux-next on Xen. But in reality this bug was always present. This can be fixed by replacing the call to __alloc_percpu with alloc_percpu. The latter will use an alignment which are a power of two. [1] [ 0.023921] illegal size (48) or align (48) for percpu allocation [ 0.024167] ------------[ cut here ]------------ [ 0.024344] WARNING: CPU: 0 PID: 1 at linux/mm/percpu.c:892 pcpu_alloc+0x88/0x6c0 [ 0.024584] Modules linked in: [ 0.024708] [ 0.024804] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc7-next-20161128 #473 [ 0.025012] Hardware name: Foundation-v8A (DT) [ 0.025162] task: ffff80003d870000 task.stack: ffff80003d844000 [ 0.025351] PC is at pcpu_alloc+0x88/0x6c0 [ 0.025490] LR is at pcpu_alloc+0x88/0x6c0 [ 0.025624] pc : [<ffff00000818e678>] lr : [<ffff00000818e678>] pstate: 60000045 [ 0.025830] sp : ffff80003d847cd0 [ 0.025946] x29: ffff80003d847cd0 x28: 0000000000000000 [ 0.026147] x27: 0000000000000000 x26: 0000000000000000 [ 0.026348] x25: 0000000000000000 x24: 0000000000000000 [ 0.026549] x23: 0000000000000000 x22: 00000000024000c0 [ 0.026752] x21: ffff000008e97000 x20: 0000000000000000 [ 0.026953] x19: 0000000000000030 x18: 0000000000000010 [ 0.027155] x17: 0000000000000a3f x16: 00000000deadbeef [ 0.027357] x15: 0000000000000006 x14: ffff000088f79c3f [ 0.027573] x13: ffff000008f79c4d x12: 0000000000000041 [ 0.027782] x11: 0000000000000006 x10: 0000000000000042 [ 0.027995] x9 : ffff80003d847a40 x8 : 6f697461636f6c6c [ 0.028208] x7 : 6120757063726570 x6 : ffff000008f79c84 [ 0.028419] x5 : 0000000000000005 x4 : 0000000000000000 [ 0.028628] x3 : 0000000000000000 x2 : 000000000000017f [ 0.028840] x1 : ffff80003d870000 x0 : 0000000000000035 [ 0.029056] [ 0.029152] ---[ end trace 0000000000000000 ]--- [ 0.029297] Call trace: [ 0.029403] Exception stack(0xffff80003d847b00 to 0xffff80003d847c30) [ 0.029621] 7b00: 0000000000000030 0001000000000000 ffff80003d847cd0 ffff00000818e678 [ 0.029901] 7b20: 0000000000000002 0000000000000004 ffff000008f7c060 0000000000000035 [ 0.030153] 7b40: ffff000008f79000 ffff000008c4cd88 ffff80003d847bf0 ffff000008101778 [ 0.030402] 7b60: 0000000000000030 0000000000000000 ffff000008e97000 00000000024000c0 [ 0.030647] 7b80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.030895] 7ba0: 0000000000000035 ffff80003d870000 000000000000017f 0000000000000000 [ 0.031144] 7bc0: 0000000000000000 0000000000000005 ffff000008f79c84 6120757063726570 [ 0.031394] 7be0: 6f697461636f6c6c ffff80003d847a40 0000000000000042 0000000000000006 [ 0.031643] 7c00: 0000000000000041 ffff000008f79c4d ffff000088f79c3f 0000000000000006 [ 0.031877] 7c20: 00000000deadbeef 0000000000000a3f [ 0.032051] [<ffff00000818e678>] pcpu_alloc+0x88/0x6c0 [ 0.032229] [<ffff00000818ece8>] __alloc_percpu+0x18/0x20 [ 0.032409] [<ffff000008d9606c>] xen_guest_init+0x174/0x2f4 [ 0.032591] [<ffff0000080830f8>] do_one_initcall+0x38/0x130 [ 0.032783] [<ffff000008d90c34>] kernel_init_freeable+0xe0/0x248 [ 0.032995] [<ffff00000899a890>] kernel_init+0x10/0x100 [ 0.033172] [<ffff000008082ec0>] ret_from_fork+0x10/0x50 Reported-by: Wei Chen <wei.chen@arm.com> Link: https://lkml.org/lkml/2016/11/28/669 Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:25 UTC
6fbd3fb xen/gntdev: Use VM_MIXEDMAP instead of VM_IO to avoid NUMA balancing commit 30faaafdfa0c754c91bac60f216c9f34a2bfdf7e upstream. Commit 9c17d96500f7 ("xen/gntdev: Grant maps should not be subject to NUMA balancing") set VM_IO flag to prevent grant maps from being subjected to NUMA balancing. It was discovered recently that this flag causes get_user_pages() to always fail with -EFAULT. check_vma_flags __get_user_pages __get_user_pages_locked __get_user_pages_unlocked get_user_pages_fast iov_iter_get_pages dio_refill_pages do_direct_IO do_blockdev_direct_IO do_blockdev_direct_IO ext4_direct_IO_read generic_file_read_iter aio_run_iocb (which can happen if guest's vdisk has direct-io-safe option). To avoid this let's use VM_MIXEDMAP flag instead --- it prevents NUMA balancing just as VM_IO does and has no effect on check_vma_flags(). Reported-by: Olaf Hering <olaf@aepfle.de> Suggested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Tested-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:24 UTC
883f12a tpm xen: Remove bogus tpm_chip_unregister commit 1f0f30e404b3d8f4597a2d9b77fba55452f8fd0e upstream. tpm_chip_unregister can only be called after tpm_chip_register. devm manages the allocation so no unwind is needed here. Fixes: afb5abc262e96 ("tpm: two-phase chip management functions") Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:24 UTC
8419f52 kernel/debug/debug_core.c: more properly delay for secondary CPUs commit 2d13bb6494c807bcf3f78af0e96c0b8615a94385 upstream. We've got a delay loop waiting for secondary CPUs. That loop uses loops_per_jiffy. However, loops_per_jiffy doesn't actually mean how many tight loops make up a jiffy on all architectures. It is quite common to see things like this in the boot log: Calibrating delay loop (skipped), value calculated using timer frequency.. 48.00 BogoMIPS (lpj=24000) In my case I was seeing lots of cases where other CPUs timed out entering the debugger only to print their stack crawls shortly after the kdb> prompt was written. Elsewhere in kgdb we already use udelay(), so that should be safe enough to use to implement our timeout. We'll delay 1 ms for 1000 times, which should give us a full second of delay (just like the old code wanted) but allow us to notice that we're done every 1 ms. [akpm@linux-foundation.org: simplifications, per Daniel] Link: http://lkml.kernel.org/r/1477091361-2039-1-git-send-email-dianders@chromium.org Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Brian Norris <briannorris@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:24 UTC
63b33e0 watchdog: qcom: fix kernel panic due to external abort on non-linefetch commit f06f35c66fdbd5ac38901a3305ce763a0cd59375 upstream. This patch fixes a off-by-one in the "watchdog: qcom: add option for standalone watchdog not in timer block" patch that causes the following panic on boot: > Unhandled fault: external abort on non-linefetch (0x1008) at 0xc8874002 > pgd = c0204000 > [c8874002] *pgd=87806811, *pte=0b017653, *ppte=0b017453 > Internal error: : 1008 [#1] SMP ARM > CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.8.6 #0 > Hardware name: Generic DT based system > PC is at 0xc02222f4 > LR is at 0x1 > pc : [<c02222f4>] lr : [<00000001>] psr: 00000113 > sp : c782fc98 ip : 00000003 fp : 00000000 > r10: 00000004 r9 : c782e000 r8 : c04ab98c > r7 : 00000001 r6 : c8874002 r5 : c782fe00 r4 : 00000002 > r3 : 00000000 r2 : c782fe00 r1 : 00100000 r0 : c8874002 > Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none > Control: 10c5387d Table: 8020406a DAC: 00000051 > Process swapper/0 (pid: 1, stack limit = 0xc782e210) > Stack: (0xc782fc98 to 0xc7830000) > [...] The WDT_STS (status) needs to be translated via wdt_addr as well. fixes: f0d9d0f4b44a ("watchdog: qcom: add option for standalone watchdog not in timer block") Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:24 UTC
bf902ea watchdog: mei_wdt: request stop on reboot to prevent false positive event commit 9eff1140a82db8c5520f76e51c21827b4af670b3 upstream. Systemd on reboot enables shutdown watchdog that leaves the watchdog device open to ensure that even if power down process get stuck the platform reboots nonetheless. The iamt_wdt is an alarm-only watchdog and can't reboot system, but the FW will generate an alarm event reboot was completed in time, as the watchdog is not automatically disabled during power cycle. So we should request stop watchdog on reboot to eliminate wrong alarm from the FW. Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:24 UTC
2f826a7 kernel/watchdog: use nmi registers snapshot in hardlockup handler commit 4d1f0fb096aedea7bb5489af93498a82e467c480 upstream. NMI handler doesn't call set_irq_regs(), it's set only by normal IRQ. Thus get_irq_regs() returns NULL or stale registers snapshot with IP/SP pointing to the code interrupted by IRQ which was interrupted by NMI. NULL isn't a problem: in this case watchdog calls dump_stack() and prints full stack trace including NMI. But if we're stuck in IRQ handler then NMI watchlog will print stack trace without IRQ part at all. This patch uses registers snapshot passed into NMI handler as arguments: these registers point exactly to the instruction interrupted by NMI. Fixes: 55537871ef66 ("kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup") Link: http://lkml.kernel.org/r/146771764784.86724.6006627197118544150.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:24 UTC
bbf23f0 CIFS: Fix a possible memory corruption in push locks commit e3d240e9d505fc67f8f8735836df97a794bbd946 upstream. If maxBuf is not 0 but less than a size of SMB2 lock structure we can end up with a memory corruption. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:24 UTC
9f1f507 CIFS: Fix missing nls unload in smb2_reconnect() commit 4772c79599564bd08ee6682715a7d3516f67433f upstream. Acked-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:24 UTC
ff04da3 CIFS: Fix a possible memory corruption during reconnect commit 53e0e11efe9289535b060a51d4cf37c25e0d0f2b upstream. We can not unlock/lock cifs_tcp_ses_lock while walking through ses and tcon lists because it can corrupt list iterator pointers and a tcon structure can be released if we don't hold an extra reference. Fix it by moving a reconnect process to a separate delayed work and acquiring a reference to every tcon that needs to be reconnected. Also do not send an echo request on newly established connections. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:24 UTC
6cb589c ASoC: intel: Fix crash at suspend/resume without card registration commit 2fc995a87f2efcd803438f07bfecd35cc3d90d32 upstream. When ASoC Intel SST Medfield driver is probed but without codec / card assigned, it causes an Oops and freezes the kernel at suspend/resume, PM: Suspending system (freeze) Suspending console(s) (use no_console_suspend to debug) BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffffc09d9409>] sst_soc_prepare+0x19/0xa0 [snd_soc_sst_mfld_platform] Oops: 0000 [#1] PREEMPT SMP CPU: 0 PID: 1552 Comm: systemd-sleep Tainted: G W 4.9.0-rc6-1.g5f5c2ad-default #1 Call Trace: [<ffffffffb45318f9>] dpm_prepare+0x209/0x460 [<ffffffffb4531b61>] dpm_suspend_start+0x11/0x60 [<ffffffffb40d3cc2>] suspend_devices_and_enter+0xb2/0x710 [<ffffffffb40d462e>] pm_suspend+0x30e/0x390 [<ffffffffb40d2eba>] state_store+0x8a/0x90 [<ffffffffb43c670f>] kobj_attr_store+0xf/0x20 [<ffffffffb42b0d97>] sysfs_kf_write+0x37/0x40 [<ffffffffb42b02bc>] kernfs_fop_write+0x11c/0x1b0 [<ffffffffb422be68>] __vfs_write+0x28/0x140 [<ffffffffb43728a8>] ? apparmor_file_permission+0x18/0x20 [<ffffffffb433b2ab>] ? security_file_permission+0x3b/0xc0 [<ffffffffb422d095>] vfs_write+0xb5/0x1a0 [<ffffffffb422e3d6>] SyS_write+0x46/0xa0 [<ffffffffb4719fbb>] entry_SYSCALL_64_fastpath+0x1e/0xad Add proper NULL checks in the PM code of mdfld driver. Signed-off-by: Takashi Iwai <tiwai@suse.de> Acked-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:24 UTC
769c092 dm space map metadata: fix 'struct sm_metadata' leak on failed create commit 314c25c56c1ee5026cf99c570bdfe01847927acb upstream. In dm_sm_metadata_create() we temporarily change the dm_space_map operations from 'ops' (whose .destroy function deallocates the sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't). If dm_sm_metadata_create() fails in sm_ll_new_metadata() or sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls dm_sm_destroy() with the intention of freeing the sm_metadata, but it doesn't (because the dm_space_map operations is still set to 'bootstrap_ops'). Fix this by setting the dm_space_map operations back to 'ops' if dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:24 UTC
ab10ab0 dm raid: fix discard support regression commit 11e2968478edc07a75ee1efb45011b3033c621c2 upstream. Commit ecbfb9f118 ("dm raid: add raid level takeover support") moved the configure_discard_support() call from raid_ctr() to raid_preresume(). Enabling/disabling discard _must_ happen during table load (through the .ctr hook). Fix this regression by moving the configure_discard_support() call back to raid_ctr(). Fixes: ecbfb9f118 ("dm raid: add raid level takeover support") Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:23 UTC
454b98d dm rq: fix a race condition in rq_completed() commit d15bb3a6467e102e60d954aadda5fb19ce6fd8ec upstream. It is required to hold the queue lock when calling blk_run_queue_async() to avoid that a race between blk_run_queue_async() and blk_cleanup_queue() is triggered. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:23 UTC
26011e6 dm crypt: mark key as invalid until properly loaded commit 265e9098bac02bc5e36cda21fdbad34cb5b2f48d upstream. In crypt_set_key(), if a failure occurs while replacing the old key (e.g. tfm->setkey() fails) the key must not have DM_CRYPT_KEY_VALID flag set. Otherwise, the crypto layer would have an invalid key that still has DM_CRYPT_KEY_VALID flag set. Signed-off-by: Ondrej Kozina <okozina@redhat.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:23 UTC
bd5fcd1 dm flakey: return -EINVAL on interval bounds error in flakey_ctr() commit bff7e067ee518f9ed7e1cbc63e4c9e01670d0b71 upstream. Fix to return error code -EINVAL instead of 0, as is done elsewhere in this function. Fixes: e80d1c805a3b ("dm: do not override error code returned from dm_get_device()") Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:23 UTC
1ca66d6 dm table: an 'all_blk_mq' table must be loaded for a blk-mq DM device commit 301fc3f5efb98633115bd887655b19f42c6dfaa8 upstream. When dm_table_set_type() is used by a target to establish a DM table's type (e.g. DM_TYPE_MQ_REQUEST_BASED in the case of DM multipath) the DM core must go on to verify that the devices in the table are compatible with the established type. Fixes: e83068a5 ("dm mpath: add optional "queue_mode" feature") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:23 UTC
d948d3b dm table: fix 'all_blk_mq' inconsistency when an empty table is loaded commit 6936c12cf809850180b24947271b8f068fdb15e9 upstream. An earlier DM multipath table could have been build ontop of underlying devices that were all using blk-mq. In that case, if that active multipath table is replaced with an empty DM multipath table (that reflects all paths have failed) then it is important that the 'all_blk_mq' state of the active table is transfered to the new empty DM table. Otherwise dm-rq.c:dm_old_prep_tio() will incorrectly clone a request that isn't needed by the DM multipath target when it is to issue IO to an underlying blk-mq device. Fixes: e83068a5 ("dm mpath: add optional "queue_mode" feature") Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:23 UTC
45f6311 blk-mq: Do not invoke .queue_rq() for a stopped queue commit bc27c01b5c46d3bfec42c96537c7a3fae0bb2cc4 upstream. The meaning of the BLK_MQ_S_STOPPED flag is "do not call .queue_rq()". Hence modify blk_mq_make_request() such that requests are queued instead of issued if a queue has been stopped. Reported-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:23 UTC
e3742a1 PM / OPP: Pass opp_table to dev_pm_opp_put_regulator() commit 91291d9ad92faa65a56a9a19d658d8049b78d3d4 upstream. Joonyoung Shim reported an interesting problem on his ARM octa-core Odoroid-XU3 platform. During system suspend, dev_pm_opp_put_regulator() was failing for a struct device for which dev_pm_opp_set_regulator() is called earlier. This happened because an earlier call to dev_pm_opp_of_cpumask_remove_table() function (from cpufreq-dt.c file) removed all the entries from opp_table->dev_list apart from the last CPU device in the cpumask of CPUs sharing the OPP. But both dev_pm_opp_set_regulator() and dev_pm_opp_put_regulator() routines get CPU device for the first CPU in the cpumask. And so the OPP core failed to find the OPP table for the struct device. This patch attempts to fix this problem by returning a pointer to the opp_table from dev_pm_opp_set_regulator() and using that as the parameter to dev_pm_opp_put_regulator(). This ensures that the dev_pm_opp_put_regulator() doesn't fail to find the opp table. Note that similar design problem also exists with other dev_pm_opp_put_*() APIs, but those aren't used currently by anyone and so we don't need to update them for now. Reported-by: Joonyoung Shim <jy0922.shim@samsung.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ Viresh: Wrote commit log and tested on exynos 5250 ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:23 UTC
8b63a92 usb: gadget: composite: always set ep->mult to a sensible value commit eaa496ffaaf19591fe471a36cef366146eeb9153 upstream. ep->mult is supposed to be set to Isochronous and Interrupt Endapoint's multiplier value. This value is computed from different places depending on the link speed. If we're dealing with HighSpeed, then it's part of bits [12:11] of wMaxPacketSize. This case wasn't taken into consideration before. While at that, also make sure the ep->mult defaults to one so drivers can use it unconditionally and assume they'll never multiply ep->maxpacket to zero. Cc: <stable@vger.kernel.org> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:23 UTC
d4f4b2e mm, page_alloc: keep pcp count and list contents in sync if struct page is corrupted commit a6de734bc002fe2027ccc074fbbd87d72957b7a4 upstream. Vlastimil Babka pointed out that commit 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP") will allow the per-cpu list counter to be out of sync with the per-cpu list contents if a struct page is corrupted. The consequence is an infinite loop if the per-cpu lists get fully drained by free_pcppages_bulk because all the lists are empty but the count is positive. The infinite loop occurs here do { batch_free++; if (++migratetype == MIGRATE_PCPTYPES) migratetype = 0; list = &pcp->lists[migratetype]; } while (list_empty(list)); What the user sees is a bad page warning followed by a soft lockup with interrupts disabled in free_pcppages_bulk(). This patch keeps the accounting in sync. Fixes: 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP") Link: http://lkml.kernel.org/r/20161202112951.23346-2-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:23 UTC
0927d28 mm/vmscan.c: set correct defer count for shrinker commit 5f33a0803bbd781de916f5c7448cbbbbc763d911 upstream. Our system uses significantly more slab memory with memcg enabled with the latest kernel. With 3.10 kernel, slab uses 2G memory, while with 4.6 kernel, 6G memory is used. The shrinker has problem. Let's see we have two memcg for one shrinker. In do_shrink_slab: 1. Check cg1. nr_deferred = 0, assume total_scan = 700. batch size is 1024, then no memory is freed. nr_deferred = 700 2. Check cg2. nr_deferred = 700. Assume freeable = 20, then total_scan = 10 or 40. Let's assume it's 10. No memory is freed. nr_deferred = 10. The deferred share of cg1 is lost in this case. kswapd will free no memory even run above steps again and again. The fix makes sure one memcg's deferred share isn't lost. Link: http://lkml.kernel.org/r/2414be961b5d25892060315fbb56bb19d81d0c07.1476227351.git.shli@fb.com Signed-off-by: Shaohua Li <shli@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:23 UTC
3e0ef1b nvmet: Fix possible infinite loop triggered on hot namespace removal commit e4fcf07cca6a3b6c4be00df16f08be894325eaa3 upstream. When removing a namespace we delete it from the subsystem namespaces list with list_del_init which allows us to know if it is enabled or not. The problem is that list_del_init initialize the list next and does not respect the RCU list-traversal we do on the IO path for locating a namespace. Instead we need to use list_del_rcu which is allowed to run concurrently with the _rcu list-traversal primitives (keeps list next intact) and guarantees concurrent nvmet_find_naespace forward progress. By changing that, we cannot rely on ns->dev_link for knowing if the namspace is enabled, so add enabled indicator entry to nvmet_ns for that. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Solganik Alexander <sashas@lightbitslabs.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:23 UTC
6290a3b loop: return proper error from loop_queue_rq() commit b4a567e8114327518c09f5632339a5954ab975a3 upstream. ->queue_rq() should return one of the BLK_MQ_RQ_QUEUE_* constants, not an errno. Fixes: f4aa4c7bbac6 ("block: loop: convert to per-device workqueue") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:22 UTC
bf0f020 f2fs: fix overflow due to condition check order commit e87f7329bbd6760c2acc4f1eb423362b08851a71 upstream. In the last ilen case, i was already increased, resulting in accessing out- of-boundary entry of do_replace and blkaddr. Fix to check ilen first to exit the loop. Fixes: 2aa8fbb9693020 ("f2fs: refactor __exchange_data_block for speed up") Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:22 UTC
154d83a f2fs: set ->owner for debugfs status file's file_operations commit 05e6ea2685c964db1e675a24a4f4e2adc22d2388 upstream. The struct file_operations instance serving the f2fs/status debugfs file lacks an initialization of its ->owner. This means that although that file might have been opened, the f2fs module can still get removed. Any further operation on that opened file, releasing included, will cause accesses to unmapped memory. Indeed, Mike Marshall reported the following: BUG: unable to handle kernel paging request at ffffffffa0307430 IP: [<ffffffff8132a224>] full_proxy_release+0x24/0x90 <...> Call Trace: [] __fput+0xdf/0x1d0 [] ____fput+0xe/0x10 [] task_work_run+0x8e/0xc0 [] do_exit+0x2ae/0xae0 [] ? __audit_syscall_entry+0xae/0x100 [] ? syscall_trace_enter+0x1ca/0x310 [] do_group_exit+0x44/0xc0 [] SyS_exit_group+0x14/0x20 [] do_syscall_64+0x61/0x150 [] entry_SYSCALL64_slow_path+0x25/0x25 <...> ---[ end trace f22ae883fa3ea6b8 ]--- Fixing recursive fault but reboot is needed! Fix this by initializing the f2fs/status file_operations' ->owner with THIS_MODULE. This will allow debugfs to grab a reference to the f2fs module upon any open on that file, thus preventing it from getting removed. Fixes: 902829aa0b72 ("f2fs: move proc files to debugfs") Reported-by: Mike Marshall <hubcap@omnibond.com> Reported-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:22 UTC
67e5239 Revert "f2fs: use percpu_counter for # of dirty pages in inode" commit 204706c7accfabb67b97eef9f9a28361b6201199 upstream. This reverts commit 1beba1b3a953107c3ff5448ab4e4297db4619c76. The perpcu_counter doesn't provide atomicity in single core and consume more DRAM. That incurs fs_mark test failure due to ENOMEM. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:22 UTC
d06eaf2 ext4: do not perform data journaling when data is encrypted commit 73b92a2a5e97d17cc4d5c4fe9d724d3273fb6fd2 upstream. Currently data journalling is incompatible with encryption: enabling both at the same time has never been supported by design, and would result in unpredictable behavior. However, users are not precluded from turning on both features simultaneously. This change programmatically replaces data journaling for encrypted regular files with ordered data journaling mode. Background: Journaling encrypted data has not been supported because it operates on buffer heads of the page in the page cache. Namely, when the commit happens, which could be up to five seconds after caching, the commit thread uses the buffer heads attached to the page to copy the contents of the page to the journal. With encryption, it would have been required to keep the bounce buffer with ciphertext for up to the aforementioned five seconds, since the page cache can only hold plaintext and could not be used for journaling. Alternatively, it would be required to setup the journal to initiate a callback at the commit time to perform deferred encryption - in this case, not only would the data have to be written twice, but it would also have to be encrypted twice. This level of complexity was not justified for a mode that in practice is very rarely used because of the overhead from the data journalling. Solution: If data=journaled has been set as a mount option for a filesystem, or if journaling is enabled on a regular file, do not perform journaling if the file is also encrypted, instead fall back to the data=ordered mode for the file. Rationale: The intent is to allow seamless and proper filesystem operation when journaling and encryption have both been enabled, and have these two conflicting features gracefully resolved by the filesystem. Fixes: 4461471107b7 Signed-off-by: Sergey Karamov <skaramov@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:22 UTC
e33673b ext4: return -ENOMEM instead of success commit 578620f451f836389424833f1454eeeb2ffc9e9f upstream. We should set the error code if kzalloc() fails. Fixes: 67cf5b09a46f ("ext4: add the basic function for inline data support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:22 UTC
3664877 ext4: reject inodes with negative size commit 7e6e1ef48fc02f3ac5d0edecbb0c6087cd758d58 upstream. Don't load an inode with a negative size; this causes integer overflow problems in the VFS. [ Added EXT4_ERROR_INODE() to mark file system as corrupted. -TYT] Fixes: a48380f769df (ext4: rename i_dir_acl to i_size_high) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:22 UTC
1bfcffb ext4: add sanity checking to count_overhead() commit c48ae41bafe31e9a66d8be2ced4e42a6b57fa814 upstream. The commit "ext4: sanity check the block and cluster size at mount time" should prevent any problems, but in case the superblock is modified while the file system is mounted, add an extra safety check to make sure we won't overrun the allocated buffer. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:22 UTC
9689eb9 ext4: fix in-superblock mount options processing commit 5aee0f8a3f42c94c5012f1673420aee96315925a upstream. Fix a large number of problems with how we handle mount options in the superblock. For one, if the string in the superblock is long enough that it is not null terminated, we could run off the end of the string and try to interpret superblocks fields as characters. It's unlikely this will cause a security problem, but it could result in an invalid parse. Also, parse_options is destructive to the string, so in some cases if there is a comma-separated string, it would be modified in the superblock. (Fortunately it only happens on file systems with a 1k block size.) Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:22 UTC
52a9daa ext4: use more strict checks for inodes_per_block on mount commit cd6bb35bf7f6d7d922509bf50265383a0ceabe96 upstream. Centralize the checks for inodes_per_block and be more strict to make sure the inodes_per_block_group can't end up being zero. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:22 UTC
7505584 ext4: fix stack memory corruption with 64k block size commit 30a9d7afe70ed6bd9191d3000e2ef1a34fb58493 upstream. The number of 'counters' elements needed in 'struct sg' is super_block->s_blocksize_bits + 2. Presently we have 16 'counters' elements in the array. This is insufficient for block sizes >= 32k. In such cases the memcpy operation performed in ext4_mb_seq_groups_show() would cause stack memory corruption. Fixes: c9de560ded61f Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:22 UTC
86efd99 ext4: fix mballoc breakage with 64k block size commit 69e43e8cc971a79dd1ee5d4343d8e63f82725123 upstream. 'border' variable is set to a value of 2 times the block size of the underlying filesystem. With 64k block size, the resulting value won't fit into a 16-bit variable. Hence this commit changes the data type of 'border' to 'unsigned int'. Fixes: c9de560ded61f Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:21 UTC
8022387 crypto: caam - fix AEAD givenc descriptors commit d128af17876d79b87edf048303f98b35f6a53dbc upstream. The AEAD givenc descriptor relies on moving the IV through the output FIFO and then back to the CTX2 for authentication. The SEQ FIFO STORE could be scheduled before the data can be read from OFIFO, especially since the SEQ FIFO LOAD needs to wait for the SEQ FIFO LOAD SKIP to finish first. The SKIP takes more time when the input is SG than when it's a contiguous buffer. If the SEQ FIFO LOAD is not scheduled before the STORE, the DECO will hang waiting for data to be available in the OFIFO so it can be transferred to C2. In order to overcome this, first force transfer of IV to C2 by starting the "cryptlen" transfer first and then starting to store data from OFIFO to the output buffer. Fixes: 1acebad3d8db8 ("crypto: caam - faster aead implementation") Signed-off-by: Alex Porosanu <alexandru.porosanu@nxp.com> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:21 UTC
ade692b ptrace: Capture the ptracer's creds not PT_PTRACE_CAP commit 64b875f7ac8a5d60a4e191479299e931ee949b67 upstream. When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was overlooked. This can result in incorrect behavior when an application like strace traces an exec of a setuid executable. Further PT_PTRACE_CAP does not have enough information for making good security decisions as it does not report which user namespace the capability is in. This has already allowed one mistake through insufficient granulariy. I found this issue when I was testing another corner case of exec and discovered that I could not get strace to set PT_PTRACE_CAP even when running strace as root with a full set of caps. This change fixes the above issue with strace allowing stracing as root a setuid executable without disabling setuid. More fundamentaly this change allows what is allowable at all times, by using the correct information in it's decision. Fixes: 4214e42f96d4 ("v2.4.9.11 -> v2.4.9.12") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:21 UTC
23d179a vfs,mm: fix return value of read() at s_maxbytes commit d05c5f7ba164aed3db02fb188c26d0dd94f5455b upstream. We truncated the possible read iterator to s_maxbytes in commit c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()"), but our end condition handling was wrong: it's not an error to try to read at the end of the file. Reading past the end should return EOF (0), not EINVAL. See for example https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1649342 http://lists.gnu.org/archive/html/bug-coreutils/2016-12/msg00008.html where a md5sum of a maximally sized file fails because the final read is exactly at s_maxbytes. Fixes: c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()") Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com> Cc: Wei Fang <fangwei1@huawei.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:21 UTC
e45692f mm: Add a user_ns owner to mm_struct and fix ptrace permission checks commit bfedb589252c01fa505ac9f6f2a3d5d68d707ef4 upstream. During exec dumpable is cleared if the file that is being executed is not readable by the user executing the file. A bug in ptrace_may_access allows reading the file if the executable happens to enter into a subordinate user namespace (aka clone(CLONE_NEWUSER), unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER). This problem is fixed with only necessary userspace breakage by adding a user namespace owner to mm_struct, captured at the time of exec, so it is clear in which user namespace CAP_SYS_PTRACE must be present in to be able to safely give read permission to the executable. The function ptrace_may_access is modified to verify that the ptracer has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns. This ensures that if the task changes it's cred into a subordinate user namespace it does not become ptraceable. The function ptrace_attach is modified to only set PT_PTRACE_CAP when CAP_SYS_PTRACE is held over task->mm->user_ns. The intent of PT_PTRACE_CAP is to be a flag to note that whatever permission changes the task might go through the tracer has sufficient permissions for it not to be an issue. task->cred->user_ns is always the same as or descendent of mm->user_ns. Which guarantees that having CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks credentials. To prevent regressions mm->dumpable and mm->user_ns are not considered when a task has no mm. As simply failing ptrace_may_attach causes regressions in privileged applications attempting to read things such as /proc/<pid>/stat Acked-by: Kees Cook <keescook@chromium.org> Tested-by: Cyrill Gorcunov <gorcunov@openvz.org> Fixes: 8409cca70561 ("userns: allow ptrace from non-init user namespaces") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:21 UTC
04804d8 block_dev: don't test bdev->bd_contains when it is not stable commit bcc7f5b4bee8e327689a4d994022765855c807ff upstream. bdev->bd_contains is not stable before calling __blkdev_get(). When __blkdev_get() is called on a parition with ->bd_openers == 0 it sets bdev->bd_contains = bdev; which is not correct for a partition. After a call to __blkdev_get() succeeds, ->bd_openers will be > 0 and then ->bd_contains is stable. When FMODE_EXCL is used, blkdev_get() calls bd_start_claiming() -> bd_prepare_to_claim() -> bd_may_claim() This call happens before __blkdev_get() is called, so ->bd_contains is not stable. So bd_may_claim() cannot safely use ->bd_contains. It currently tries to use it, and this can lead to a BUG_ON(). This happens when a whole device is already open with a bd_holder (in use by dm in my particular example) and two threads race to open a partition of that device for the first time, one opening with O_EXCL and one without. The thread that doesn't use O_EXCL gets through blkdev_get() to __blkdev_get(), gains the ->bd_mutex, and sets bdev->bd_contains = bdev; Immediately thereafter the other thread, using FMODE_EXCL, calls bd_start_claiming() from blkdev_get(). This should fail because the whole device has a holder, but because bdev->bd_contains == bdev bd_may_claim() incorrectly reports success. This thread continues and blocks on bd_mutex. The first thread then sets bdev->bd_contains correctly and drops the mutex. The thread using FMODE_EXCL then continues and when it calls bd_may_claim() again in: BUG_ON(!bd_may_claim(bdev, whole, holder)); The BUG_ON fires. Fix this by removing the dependency on ->bd_contains in bd_may_claim(). As bd_may_claim() has direct access to the whole device, it can simply test if the target bdev is the whole device. Fixes: 6b4517a7913a ("block: implement bd_claiming and claiming block") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:21 UTC
52d6972 fs: exec: apply CLOEXEC before changing dumpable task flags commit 613cc2b6f272c1a8ad33aefa21cad77af23139f7 upstream. If you have a process that has set itself to be non-dumpable, and it then undergoes exec(2), any CLOEXEC file descriptors it has open are "exposed" during a race window between the dumpable flags of the process being reset for exec(2) and CLOEXEC being applied to the file descriptors. This can be exploited by a process by attempting to access /proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE. The race in question is after set_dumpable has been (for get_link, though the trace is basically the same for readlink): [vfs] -> proc_pid_link_inode_operations.get_link -> proc_pid_get_link -> proc_fd_access_allowed -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS); Which will return 0, during the race window and CLOEXEC file descriptors will still be open during this window because do_close_on_exec has not been called yet. As a result, the ordering of these calls should be reversed to avoid this race window. This is of particular concern to container runtimes, where joining a PID namespace with file descriptors referring to the host filesystem can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect against access of CLOEXEC file descriptors -- file descriptors which may reference filesystem objects the container shouldn't have access to). Cc: dev@opencontainers.org Reported-by: Michael Crosby <crosbymichael@gmail.com> Signed-off-by: Aleksa Sarai <asarai@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:21 UTC
781e976 exec: Ensure mm->user_ns contains the execed files commit f84df2a6f268de584a201e8911384a2d244876e3 upstream. When the user namespace support was merged the need to prevent ptrace from revealing the contents of an unreadable executable was overlooked. Correct this oversight by ensuring that the executed file or files are in mm->user_ns, by adjusting mm->user_ns. Use the new function privileged_wrt_inode_uidgid to see if the executable is a member of the user namespace, and as such if having CAP_SYS_PTRACE in the user namespace should allow tracing the executable. If not update mm->user_ns to the parent user namespace until an appropriate parent is found. Reported-by: Jann Horn <jann@thejh.net> Fixes: 9e4a36ece652 ("userns: Fail exec for suid and sgid binaries with ids outside our user namespace.") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:21 UTC
fc1d3e5 btrfs: make file clone aware of fatal signals commit 69ae5e4459e43e56f03d0987e865fbac2b05af2a upstream. Indeed this just make the behavior similar to xfs when process has fatal signals pending, and it'll make fstests/generic/298 happy. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:21 UTC
8c59356 Btrfs: fix incremental send failure caused by balance commit d5e84fd8d0634d056248b67463b42f6c85896a19 upstream. Commit 951555856b88 ("Btrfs: send, don't bug on inconsistent snapshots") removed some BUG_ON() statements (replacing them with returning errors to user space and logging error messages) when a snapshot is in an inconsistent state due to failures to update a delayed inode item (ENOMEM or ENOSPC) after adding/updating/deleting references, xattrs or file extent items. However there is a case, when no errors happen, where a file extent item can be modified without having the corresponding inode item updated. This case happens during balance under very specific timings, when relocation is in the stage where it updates data pointers and a leaf that contains file extent items is COWed. When that happens file extent items get their disk_bytenr field updated to a new value that reflects the post relocation logical address of the extent, without updating their respective inode items (as there is nothing that needs to be updated on them). This is performed at relocation.c:replace_file_extents() through relocation.c:btrfs_reloc_cow_block(). So make an incremental send deal with this case and don't do any processing for a file extent item that got its disk_bytenr field updated by relocation, since the extent's data is the same as the one pointed by the file extent item in the parent snapshot. After the recent commit mentioned above this case resulted in EIO errors returned to user space (and an error message logged to dmesg/syslog) when doing an incremental send, while before it, it resulted in hitting a BUG_ON leading to the following trace: [ 952.206705] ------------[ cut here ]------------ [ 952.206714] kernel BUG at ../fs/btrfs/send.c:5653! [ 952.206719] Internal error: Oops - BUG: 0 [#1] SMP [ 952.209854] Modules linked in: st dm_mod nls_utf8 isofs fuse nf_log_ipv6 xt_pkttype xt_physdev br_netfilter nf_log_ipv4 nf_log_common xt_LOG xt_limit ebtable_filter ebtables af_packet bridge stp llc ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables xfs libcrc32c nls_iso8859_1 nls_cp437 vfat fat joydev aes_ce_blk ablk_helper cryptd snd_intel8x0 aes_ce_cipher snd_ac97_codec ac97_bus snd_pcm ghash_ce sha2_ce sha1_ce snd_timer snd virtio_net soundcore btrfs xor sr_mod cdrom hid_generic usbhid raid6_pq virtio_blk virtio_scsi bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_mmio xhci_pci xhci_hcd usbcore usb_common virtio_pci virtio_ring virtio drm sg efivarfs [ 952.228333] Supported: Yes [ 952.228908] CPU: 0 PID: 12779 Comm: snapperd Not tainted 4.4.14-50-default #1 [ 952.230329] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 952.231683] task: ffff800058e94100 ti: ffff8000d866c000 task.ti: ffff8000d866c000 [ 952.233279] PC is at changed_cb+0x9f4/0xa48 [btrfs] [ 952.234375] LR is at changed_cb+0x58/0xa48 [btrfs] [ 952.236552] pc : [<ffff7ffffc39de7c>] lr : [<ffff7ffffc39d4e0>] pstate: 80000145 [ 952.238049] sp : ffff8000d866fa20 [ 952.238732] x29: ffff8000d866fa20 x28: 0000000000000019 [ 952.239840] x27: 00000000000028d5 x26: 00000000000024a2 [ 952.241008] x25: 0000000000000002 x24: ffff8000e66e92f0 [ 952.242131] x23: ffff8000b8c76800 x22: ffff800092879140 [ 952.243238] x21: 0000000000000002 x20: ffff8000d866fb78 [ 952.244348] x19: ffff8000b8f8c200 x18: 0000000000002710 [ 952.245607] x17: 0000ffff90d42480 x16: ffff800000237dc0 [ 952.246719] x15: 0000ffff90de7510 x14: ab000c000a2faf08 [ 952.247835] x13: 0000000000577c2b x12: ab000c000b696665 [ 952.248981] x11: 2e65726f632f6966 x10: 652d34366d72612f [ 952.250101] x9 : 32627572672f746f x8 : ab000c00092f1671 [ 952.251352] x7 : 8000000000577c2b x6 : ffff800053eadf45 [ 952.252468] x5 : 0000000000000000 x4 : ffff80005e169494 [ 952.253582] x3 : 0000000000000004 x2 : ffff8000d866fb78 [ 952.254695] x1 : 000000000003e2a3 x0 : 000000000003e2a4 [ 952.255803] [ 952.256150] Process snapperd (pid: 12779, stack limit = 0xffff8000d866c020) [ 952.257516] Stack: (0xffff8000d866fa20 to 0xffff8000d8670000) [ 952.258654] fa20: ffff8000d866fae0 ffff7ffffc308fc0 ffff800092879140 ffff8000e66e92f0 [ 952.260219] fa40: 0000000000000035 ffff800055de6000 ffff8000b8c76800 ffff8000d866fb78 [ 952.261745] fa60: 0000000000000002 00000000000024a2 00000000000028d5 0000000000000019 [ 952.263269] fa80: ffff8000d866fae0 ffff7ffffc3090f0 ffff8000d866fae0 ffff7ffffc309128 [ 952.264797] faa0: ffff800092879140 ffff8000e66e92f0 0000000000000035 ffff800055de6000 [ 952.268261] fac0: ffff8000b8c76800 ffff8000d866fb78 0000000000000002 0000000000001000 [ 952.269822] fae0: ffff8000d866fbc0 ffff7ffffc39ecfc ffff8000b8f8c200 ffff8000b8f8c368 [ 952.271368] fb00: ffff8000b8f8c378 ffff800055de6000 0000000000000001 ffff8000ecb17500 [ 952.272893] fb20: ffff8000b8c76800 ffff800092879140 ffff800062b6d000 ffff80007a9e2470 [ 952.274420] fb40: ffff8000b8f8c208 0000000005784000 ffff8000580a8000 ffff8000b8f8c200 [ 952.276088] fb60: ffff7ffffc39d488 00000002b8f8c368 0000000000000000 000000000003e2a4 [ 952.280275] fb80: 000000000000006c ffff7ffffc39ec00 000000000003e2a4 000000000000006c [ 952.283219] fba0: ffff8000b8f8c300 0000000000000100 0000000000000001 ffff8000ecb17500 [ 952.286166] fbc0: ffff8000d866fcd0 ffff7ffffc3643c0 ffff8000f8842700 0000ffff8ffe9278 [ 952.289136] fbe0: 0000000040489426 ffff800055de6000 0000ffff8ffe9278 0000000040489426 [ 952.292083] fc00: 000000000000011d 000000000000001d ffff80007a9e4598 ffff80007a9e43e8 [ 952.294959] fc20: ffff8000b8c7693f 0000000000003b24 0000000000000019 ffff8000b8f8c218 [ 952.301161] fc40: 00000001d866fc70 ffff8000b8c76800 0000000000000128 ffffffffffffff84 [ 952.305749] fc60: ffff800058e941ff 0000000000003a58 ffff8000d866fcb0 ffff8000000f7390 [ 952.308875] fc80: 000000000000012a 0000000000010290 ffff8000d866fc00 000000000000007b [ 952.311915] fca0: 0000000000010290 ffff800046c1b100 74732d7366727462 000001006d616572 [ 952.314937] fcc0: ffff8000fffc4100 cb88537fdc8ba60e ffff8000d866fe10 ffff8000002499e8 [ 952.318008] fce0: 0000000040489426 ffff8000f8842700 0000ffff8ffe9278 ffff80007a9e4598 [ 952.321321] fd00: 0000ffff8ffe9278 0000000040489426 000000000000011d 000000000000001d [ 952.324280] fd20: ffff80000072c000 ffff8000d866c000 ffff8000d866fda0 ffff8000000e997c [ 952.327156] fd40: ffff8000fffc4180 00000000000031ed ffff8000fffc4180 ffff800046c1b7d4 [ 952.329895] fd60: 0000000000000140 0000ffff907ea170 000000000000011d 00000000000000dc [ 952.334641] fd80: ffff80000072c000 ffff8000d866c000 0000000000000000 0000000000000002 [ 952.338002] fda0: ffff8000d866fdd0 ffff8000000ebacc ffff800046c1b080 ffff800046c1b7d4 [ 952.340724] fdc0: ffff8000d866fdf0 ffff8000000db67c 0000000000000040 ffff800000e69198 [ 952.343415] fde0: 0000ffff8ffea790 00000000000031ed ffff8000d866fe20 ffff800000254000 [ 952.346101] fe00: 000000000000001d 0000000000000004 ffff8000d866fe90 ffff800000249d3c [ 952.348980] fe20: ffff8000f8842700 0000000000000000 ffff8000f8842701 0000000000000008 [ 952.351696] fe40: ffff8000d866fe70 0000000000000008 ffff8000d866fe90 ffff800000249cf8 [ 952.354387] fe60: ffff8000f8842700 0000ffff8ffe9170 ffff8000f8842701 0000000000000008 [ 952.357083] fe80: 0000ffff8ffe9278 ffff80008ff85500 0000ffff8ffe90c0 ffff800000085c84 [ 952.359800] fea0: 0000000000000000 0000ffff8ffe9170 ffffffffffffffff 0000ffff90d473bc [ 952.365351] fec0: 0000000000000000 0000000000000015 0000000000000008 0000000040489426 [ 952.369550] fee0: 0000ffff8ffe9278 0000ffff907ea790 0000ffff907ea170 0000ffff907ea790 [ 952.372416] ff00: 0000ffff907ea170 0000000000000000 000000000000001d 0000000000000004 [ 952.375223] ff20: 0000ffff90a32220 00000000003d0f00 0000ffff907ea0a0 0000ffff8ffe8f30 [ 952.378099] ff40: 0000ffff9100f554 0000ffff91147000 0000ffff91117bc0 0000ffff90d473b0 [ 952.381115] ff60: 0000ffff9100f620 0000ffff880069b0 0000ffff8ffe9170 0000ffff8ffe91a0 [ 952.384003] ff80: 0000ffff8ffe9160 0000ffff8ffe9140 0000ffff88006990 0000ffff8ffe9278 [ 952.386860] ffa0: 0000ffff88008a60 0000ffff8ffe9480 0000ffff88014ca0 0000ffff8ffe90c0 [ 952.389654] ffc0: 0000ffff910be8e8 0000ffff8ffe90c0 0000ffff90d473bc 0000000000000000 [ 952.410986] ffe0: 0000000000000008 000000000000001d 6e2079747265706f 72616d223d656d61 [ 952.415497] Call trace: [ 952.417403] [<ffff7ffffc39de7c>] changed_cb+0x9f4/0xa48 [btrfs] [ 952.420023] [<ffff7ffffc308fc0>] btrfs_compare_trees+0x500/0x6b0 [btrfs] [ 952.422759] [<ffff7ffffc39ecfc>] btrfs_ioctl_send+0xb4c/0xe10 [btrfs] [ 952.425601] [<ffff7ffffc3643c0>] btrfs_ioctl+0x374/0x29a4 [btrfs] [ 952.428031] [<ffff8000002499e8>] do_vfs_ioctl+0x33c/0x600 [ 952.430360] [<ffff800000249d3c>] SyS_ioctl+0x90/0xa4 [ 952.432552] [<ffff800000085c84>] el0_svc_naked+0x38/0x3c [ 952.434803] Code: 2a1503e0 17fffdac b9404282 17ffff28 (d4210000) [ 952.437457] ---[ end trace 9afd7090c466cf15 ]--- Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:21 UTC
02fffa1 Btrfs: don't BUG() during drop snapshot commit 4867268c57ff709a7b6b86ae6f6537d846d1443a upstream. Really there's lots of things that can go wrong here, kill all the BUG_ON()'s and replace the logic ones with ASSERT()'s and return EIO instead. Signed-off-by: Josef Bacik <jbacik@fb.com> [ switched to btrfs_err, errors go to common label ] Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:20 UTC
0f2e022 btrfs: fix a possible umount deadlock commit 0ccd05285e7f5a8e297e1d6dfc41e7c65757d6fa upstream. btrfs_show_devname() is using the device_list_mutex, sometimes a call to blkdev_put() leads vfs calling into this func. So call blkdev_put() outside of device_list_mutex, as of now. [ 983.284212] ====================================================== [ 983.290401] [ INFO: possible circular locking dependency detected ] [ 983.296677] 4.8.0-rc5-ceph-00023-g1b39cec2 #1 Not tainted [ 983.302081] ------------------------------------------------------- [ 983.308357] umount/21720 is trying to acquire lock: [ 983.313243] (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff9128ec51>] blkdev_put+0x31/0x150 [ 983.321264] [ 983.321264] but task is already holding lock: [ 983.327101] (&fs_devs->device_list_mutex){+.+...}, at: [<ffffffffc033d6f6>] __btrfs_close_devices+0x46/0x200 [btrfs] [ 983.337839] [ 983.337839] which lock already depends on the new lock. [ 983.337839] [ 983.346024] [ 983.346024] the existing dependency chain (in reverse order) is: [ 983.353512] -> #4 (&fs_devs->device_list_mutex){+.+...}: [ 983.359096] [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0 [ 983.365143] [<ffffffff91823125>] mutex_lock_nested+0x65/0x350 [ 983.371521] [<ffffffffc02d8116>] btrfs_show_devname+0x36/0x1f0 [btrfs] [ 983.378710] [<ffffffff9129523e>] show_vfsmnt+0x4e/0x150 [ 983.384593] [<ffffffff9126ffc7>] m_show+0x17/0x20 [ 983.389957] [<ffffffff91276405>] seq_read+0x2b5/0x3b0 [ 983.395669] [<ffffffff9124c808>] __vfs_read+0x28/0x100 [ 983.401464] [<ffffffff9124eb3b>] vfs_read+0xab/0x150 [ 983.407080] [<ffffffff9124ec32>] SyS_read+0x52/0xb0 [ 983.412609] [<ffffffff91825fc0>] entry_SYSCALL_64_fastpath+0x23/0xc1 [ 983.419617] -> #3 (namespace_sem){++++++}: [ 983.424024] [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0 [ 983.430074] [<ffffffff918239e9>] down_write+0x49/0x80 [ 983.435785] [<ffffffff91272457>] lock_mount+0x67/0x1c0 [ 983.441582] [<ffffffff91272ab2>] do_add_mount+0x32/0xf0 [ 983.447458] [<ffffffff9127363a>] finish_automount+0x5a/0xc0 [ 983.453682] [<ffffffff91259513>] follow_managed+0x1b3/0x2a0 [ 983.459912] [<ffffffff9125b750>] lookup_fast+0x300/0x350 [ 983.465875] [<ffffffff9125d6e7>] path_openat+0x3a7/0xaa0 [ 983.471846] [<ffffffff9125ef75>] do_filp_open+0x85/0xe0 [ 983.477731] [<ffffffff9124c41c>] do_sys_open+0x14c/0x1f0 [ 983.483702] [<ffffffff9124c4de>] SyS_open+0x1e/0x20 [ 983.489240] [<ffffffff91825fc0>] entry_SYSCALL_64_fastpath+0x23/0xc1 [ 983.496254] -> #2 (&sb->s_type->i_mutex_key#3){+.+.+.}: [ 983.501798] [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0 [ 983.507855] [<ffffffff918239e9>] down_write+0x49/0x80 [ 983.513558] [<ffffffff91366237>] start_creating+0x87/0x100 [ 983.519703] [<ffffffff91366647>] debugfs_create_dir+0x17/0x100 [ 983.526195] [<ffffffff911df153>] bdi_register+0x93/0x210 [ 983.532165] [<ffffffff911df313>] bdi_register_owner+0x43/0x70 [ 983.538570] [<ffffffff914080fb>] device_add_disk+0x1fb/0x450 [ 983.544888] [<ffffffff91580226>] loop_add+0x1e6/0x290 [ 983.550596] [<ffffffff91fec358>] loop_init+0x10b/0x14f [ 983.556394] [<ffffffff91002207>] do_one_initcall+0xa7/0x180 [ 983.562618] [<ffffffff91f932e0>] kernel_init_freeable+0x1cc/0x266 [ 983.569370] [<ffffffff918174be>] kernel_init+0xe/0x100 [ 983.575166] [<ffffffff9182620f>] ret_from_fork+0x1f/0x40 [ 983.581131] -> #1 (loop_index_mutex){+.+.+.}: [ 983.585801] [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0 [ 983.591858] [<ffffffff91823125>] mutex_lock_nested+0x65/0x350 [ 983.598256] [<ffffffff9157ed3f>] lo_open+0x1f/0x60 [ 983.603704] [<ffffffff9128eec3>] __blkdev_get+0x123/0x400 [ 983.609757] [<ffffffff9128f4ea>] blkdev_get+0x34a/0x350 [ 983.615639] [<ffffffff9128f554>] blkdev_open+0x64/0x80 [ 983.621428] [<ffffffff9124aff6>] do_dentry_open+0x1c6/0x2d0 [ 983.627651] [<ffffffff9124c029>] vfs_open+0x69/0x80 [ 983.633181] [<ffffffff9125db74>] path_openat+0x834/0xaa0 [ 983.639152] [<ffffffff9125ef75>] do_filp_open+0x85/0xe0 [ 983.645035] [<ffffffff9124c41c>] do_sys_open+0x14c/0x1f0 [ 983.650999] [<ffffffff9124c4de>] SyS_open+0x1e/0x20 [ 983.656535] [<ffffffff91825fc0>] entry_SYSCALL_64_fastpath+0x23/0xc1 [ 983.663541] -> #0 (&bdev->bd_mutex){+.+.+.}: [ 983.668107] [<ffffffff910def43>] __lock_acquire+0x1003/0x17b0 [ 983.674510] [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0 [ 983.680561] [<ffffffff91823125>] mutex_lock_nested+0x65/0x350 [ 983.686967] [<ffffffff9128ec51>] blkdev_put+0x31/0x150 [ 983.692761] [<ffffffffc033481f>] btrfs_close_bdev+0x4f/0x60 [btrfs] [ 983.699699] [<ffffffffc033d77b>] __btrfs_close_devices+0xcb/0x200 [btrfs] [ 983.707178] [<ffffffffc033d8db>] btrfs_close_devices+0x2b/0xa0 [btrfs] [ 983.714380] [<ffffffffc03081c5>] close_ctree+0x265/0x340 [btrfs] [ 983.721061] [<ffffffffc02d7959>] btrfs_put_super+0x19/0x20 [btrfs] [ 983.727908] [<ffffffff91250e2f>] generic_shutdown_super+0x6f/0x100 [ 983.734744] [<ffffffff91250f56>] kill_anon_super+0x16/0x30 [ 983.740888] [<ffffffffc02da97e>] btrfs_kill_super+0x1e/0x130 [btrfs] [ 983.747909] [<ffffffff91250fe9>] deactivate_locked_super+0x49/0x80 [ 983.754745] [<ffffffff912515fd>] deactivate_super+0x5d/0x70 [ 983.760977] [<ffffffff91270a1c>] cleanup_mnt+0x5c/0x80 [ 983.766773] [<ffffffff91270a92>] __cleanup_mnt+0x12/0x20 [ 983.772738] [<ffffffff910aa2fe>] task_work_run+0x7e/0xc0 [ 983.778708] [<ffffffff91081b5a>] exit_to_usermode_loop+0x7e/0xb4 [ 983.785373] [<ffffffff910039eb>] syscall_return_slowpath+0xbb/0xd0 [ 983.792212] [<ffffffff9182605c>] entry_SYSCALL_64_fastpath+0xbf/0xc1 [ 983.799225] [ 983.799225] other info that might help us debug this: [ 983.799225] [ 983.807291] Chain exists of: &bdev->bd_mutex --> namespace_sem --> &fs_devs->device_list_mutex [ 983.816521] Possible unsafe locking scenario: [ 983.816521] [ 983.822489] CPU0 CPU1 [ 983.827043] ---- ---- [ 983.831599] lock(&fs_devs->device_list_mutex); [ 983.836289] lock(namespace_sem); [ 983.842268] lock(&fs_devs->device_list_mutex); [ 983.849478] lock(&bdev->bd_mutex); [ 983.853127] [ 983.853127] *** DEADLOCK *** [ 983.853127] [ 983.859113] 3 locks held by umount/21720: [ 983.863145] #0: (&type->s_umount_key#35){++++..}, at: [<ffffffff912515f5>] deactivate_super+0x55/0x70 [ 983.872713] #1: (uuid_mutex){+.+.+.}, at: [<ffffffffc033d8d3>] btrfs_close_devices+0x23/0xa0 [btrfs] [ 983.882206] #2: (&fs_devs->device_list_mutex){+.+...}, at: [<ffffffffc033d6f6>] __btrfs_close_devices+0x46/0x200 [btrfs] [ 983.893422] [ 983.893422] stack backtrace: [ 983.897824] CPU: 6 PID: 21720 Comm: umount Not tainted 4.8.0-rc5-ceph-00023-g1b39cec2 #1 [ 983.905958] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015 [ 983.913492] 0000000000000000 ffff8c8a53c17a38 ffffffff91429521 ffffffff9260f4f0 [ 983.921018] ffffffff92642760 ffff8c8a53c17a88 ffffffff911b2b04 0000000000000050 [ 983.928542] ffffffff9237d620 ffff8c8a5294aee0 ffff8c8a5294aeb8 ffff8c8a5294aee0 [ 983.936072] Call Trace: [ 983.938545] [<ffffffff91429521>] dump_stack+0x85/0xc4 [ 983.943715] [<ffffffff911b2b04>] print_circular_bug+0x1fb/0x20c [ 983.949748] [<ffffffff910def43>] __lock_acquire+0x1003/0x17b0 [ 983.955613] [<ffffffff910dfd0c>] lock_acquire+0x1bc/0x1f0 [ 983.961123] [<ffffffff9128ec51>] ? blkdev_put+0x31/0x150 [ 983.966550] [<ffffffff91823125>] mutex_lock_nested+0x65/0x350 [ 983.972407] [<ffffffff9128ec51>] ? blkdev_put+0x31/0x150 [ 983.977832] [<ffffffff9128ec51>] blkdev_put+0x31/0x150 [ 983.983101] [<ffffffffc033481f>] btrfs_close_bdev+0x4f/0x60 [btrfs] [ 983.989500] [<ffffffffc033d77b>] __btrfs_close_devices+0xcb/0x200 [btrfs] [ 983.996415] [<ffffffffc033d8db>] btrfs_close_devices+0x2b/0xa0 [btrfs] [ 984.003068] [<ffffffffc03081c5>] close_ctree+0x265/0x340 [btrfs] [ 984.009189] [<ffffffff9126cc5e>] ? evict_inodes+0x15e/0x170 [ 984.014881] [<ffffffffc02d7959>] btrfs_put_super+0x19/0x20 [btrfs] [ 984.021176] [<ffffffff91250e2f>] generic_shutdown_super+0x6f/0x100 [ 984.027476] [<ffffffff91250f56>] kill_anon_super+0x16/0x30 [ 984.033082] [<ffffffffc02da97e>] btrfs_kill_super+0x1e/0x130 [btrfs] [ 984.039548] [<ffffffff91250fe9>] deactivate_locked_super+0x49/0x80 [ 984.045839] [<ffffffff912515fd>] deactivate_super+0x5d/0x70 [ 984.051525] [<ffffffff91270a1c>] cleanup_mnt+0x5c/0x80 [ 984.056774] [<ffffffff91270a92>] __cleanup_mnt+0x12/0x20 [ 984.062201] [<ffffffff910aa2fe>] task_work_run+0x7e/0xc0 [ 984.067625] [<ffffffff91081b5a>] exit_to_usermode_loop+0x7e/0xb4 [ 984.073747] [<ffffffff910039eb>] syscall_return_slowpath+0xbb/0xd0 [ 984.080038] [<ffffffff9182605c>] entry_SYSCALL_64_fastpath+0xbf/0xc1 Reported-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:20 UTC
65563ab Btrfs: fix memory leak in do_walk_down commit a958eab0ed7fdc1b977bc25d3af6efedaa945488 upstream. The extent buffer 'next' needs to be free'd conditionally. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:20 UTC
364b85c btrfs: clean the old superblocks before freeing the device commit cea67ab92d3d4da9f2b4141d87cb8664757daca0 upstream. btrfs_rm_device frees the block device but then re-opens it using the saved device name. A race exists between the close and the re-open that allows the block size to be changed. The result is getting stuck forever in the reclaim loop in __getblk_slow. This patch moves the superblock cleanup before closing the block device, which is also consistent with other callers. We also don't need a private copy of dev_name as the whole routine operates under the uuid_mutex. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:20 UTC
6a6e927 Btrfs: don't leak reloc root nodes on error commit 6bdf131fac2336adb1a628f992ba32384f653a55 upstream. We don't track the reloc roots in any sort of normal way, so the only way the root/commit_root nodes get free'd is if the relocation finishes successfully and the reloc root is deleted. Fix this by free'ing them in free_reloc_roots. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:20 UTC
4d3d9b5 Btrfs: return gracefully from balance if fs tree is corrupted commit 3561b9db70928f207be4570b48fc19898eeaef54 upstream. When relocating tree blocks, we firstly get block information from back references in the extent tree, we then search fs tree to try to find all parents of a block. However, if fs tree is corrupted, eg. if there're some missing items, we could come across these WARN_ONs and BUG_ONs. This makes us print some error messages and return gracefully from balance. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:20 UTC
a6522e4 Btrfs: bail out if block group has different mixed flag commit 49303381f19ab16a371a061b67e783d3f570d56e upstream. Currently we allow inconsistence about mixed flag (BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA). We'd get ENOSPC if block group has mixed flag and btrfs doesn't. If that happens, we have one space_info with mixed flag and another space_info only with BTRFS_BLOCK_GROUP_METADATA, and global_block_rsv.space_info points to the latter one, but all bytes from block_group contributes to the mixed space_info, thus all the allocation will fail with ENOSPC. This adds a check for the above case. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> [ updated message ] Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:20 UTC
d7839ad Btrfs: fix memory leak in reading btree blocks commit 2571e739677f1e4c0c63f5ed49adcc0857923625 upstream. So we can read a btree block via readahead or intentional read, and we can end up with a memory leak when something happens as follows, 1) readahead starts to read block A but does not wait for read completion, 2) btree_readpage_end_io_hook finds that block A is corrupted, and it needs to clear all block A's pages' uptodate bit. 3) meanwhile an intentional read kicks in and checks block A's pages' uptodate to decide which page needs to be read. 4) when some pages have the uptodate bit during 3)'s check so 3) doesn't count them for eb->io_pages, but they are later cleared by 2) so we has to readpage on the page, we get the wrong eb->io_pages which results in a memory leak of this block. This fixes the problem by firstly getting all pages's locking and then checking pages' uptodate bit. t1(readahead) t2(readahead endio) t3(the following read) read_extent_buffer_pages end_bio_extent_readpage for pg in eb: for page 0,1,2 in eb: if pg is uptodate: btree_readpage_end_io_hook(pg) num_reads++ if uptodate: eb->io_pages = num_reads SetPageUptodate(pg) _______________ for pg in eb: for page 3 in eb: read_extent_buffer_pages if pg is NOT uptodate: btree_readpage_end_io_hook(pg) for pg in eb: __extent_read_full_page(pg) sanity check reports something wrong if pg is uptodate: clear_extent_buffer_uptodate(eb) num_reads++ for pg in eb: eb->io_pages = num_reads ClearPageUptodate(page) _______________ for pg in eb: if pg is NOT uptodate: __extent_read_full_page(pg) So t3's eb->io_pages is not consistent with the number of pages it's reading, and during endio(), atomic_dec_and_test(&eb->io_pages) will get a negative number so that we're not able to free the eb. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 06 January 2017, 10:16:20 UTC
back to top