Revision c61ea31dac0319ec64b33725917bda81fc293a25 authored by David Howells on 11 May 2010, 15:51:39 UTC, committed by Linus Torvalds on 11 May 2010, 17:07:53 UTC
Fix an occasional EIO returned by a call to vfs_unlink(): [ 4868.465413] CacheFiles: I/O Error: Unlink failed [ 4868.465444] FS-Cache: Cache cachefiles stopped due to I/O error [ 4947.320011] CacheFiles: File cache on md3 unregistering [ 4947.320041] FS-Cache: Withdrawing cache "mycache" [ 5127.348683] FS-Cache: Cache "mycache" added (type cachefiles) [ 5127.348716] CacheFiles: File cache on md3 registered [ 7076.871081] CacheFiles: I/O Error: Unlink failed [ 7076.871130] FS-Cache: Cache cachefiles stopped due to I/O error [ 7116.780891] CacheFiles: File cache on md3 unregistering [ 7116.780937] FS-Cache: Withdrawing cache "mycache" [ 7296.813394] FS-Cache: Cache "mycache" added (type cachefiles) [ 7296.813432] CacheFiles: File cache on md3 registered What happens is this: (1) A cached NFS file is seen to have become out of date, so NFS retires the object and immediately acquires a new object with the same key. (2) Retirement of the old object is done asynchronously - so the lookup/create to generate the new object may be done first. This can be a problem as the old object and the new object must exist at the same point in the backing filesystem (i.e. they must have the same pathname). (3) The lookup for the new object sees that a backing file already exists, checks to see whether it is valid and sees that it isn't. It then deletes that file and creates a new one on disk. (4) The retirement phase for the old file is then performed. It tries to delete the dentry it has, but ext4_unlink() returns -EIO because the inode attached to that dentry no longer matches the inode number associated with the filename in the parent directory. The trace below shows this quite well. [md5sum] ==> __fscache_relinquish_cookie(ffff88002d12fb58{NFS.fh,ffff88002ce62100},1) [md5sum] ==> __fscache_acquire_cookie({NFS.server},{NFS.fh},ffff88002ce62100) NFS has retired the old cookie and asked for a new one. [kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_ACTIVE,24}) [kslowd] <== fscache_object_state_machine() [->OBJECT_DYING] [kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_INIT,0}) [kslowd] <== fscache_object_state_machine() [->OBJECT_LOOKING_UP] [kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_DYING,24}) [kslowd] <== fscache_object_state_machine() [->OBJECT_RECYCLING] The old object (OBJ52) is going through the terminal states to get rid of it, whilst the new object - (OBJ53) - is coming into being. [kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_LOOKING_UP,0}) [kslowd] ==> cachefiles_walk_to_object({ffff88003029d8b8},OBJ53,@68,) [kslowd] lookup '@68' [kslowd] next -> ffff88002ce41bd0 positive [kslowd] advance [kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA' [kslowd] next -> ffff8800369faac8 positive The new object has looked up the subdir in which the file would be in (getting dentry ffff88002ce41bd0) and then looked up the file itself (getting dentry ffff8800369faac8). [kslowd] validate 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA' [kslowd] ==> cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA') [kslowd] remove ffff8800369faac8 from ffff88002ce41bd0 [kslowd] unlink stale object [kslowd] <== cachefiles_bury_object() = 0 It then checks the file's xattrs to see if it's valid. NFS says that the auxiliary data indicate the file is out of date (obvious to us - that's why NFS ditched the old version and got a new one). CacheFiles then deletes the old file (dentry ffff8800369faac8). [kslowd] redo lookup [kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA' [kslowd] next -> ffff88002cd94288 negative [kslowd] create -> ffff88002cd94288{ffff88002cdaf238{ino=148247}} CacheFiles then redoes the lookup and gets a negative result in a new dentry (ffff88002cd94288) which it then creates a file for. [kslowd] ==> cachefiles_mark_object_active(,OBJ53) [kslowd] <== cachefiles_mark_object_active() = 0 [kslowd] === OBTAINED_OBJECT === [kslowd] <== cachefiles_walk_to_object() = 0 [148247] [kslowd] <== fscache_object_state_machine() [->OBJECT_AVAILABLE] The new object is then marked active and the state machine moves to the available state - at which point NFS can start filling the object. [kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_RECYCLING,20}) [kslowd] ==> fscache_release_object() [kslowd] ==> cachefiles_drop_object({OBJ52,2}) [kslowd] ==> cachefiles_delete_object(,OBJ52{ffff8800369faac8}) The old object, meanwhile, goes on with being retired. If allocation occurs first, cachefiles_delete_object() has to wait for dir->d_inode->i_mutex to become available before it can continue. [kslowd] ==> cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA') [kslowd] remove ffff8800369faac8 from ffff88002ce41bd0 [kslowd] unlink stale object EXT4-fs warning (device sda6): ext4_unlink: Inode number mismatch in unlink (148247!=148193) CacheFiles: I/O Error: Unlink failed FS-Cache: Cache cachefiles stopped due to I/O error CacheFiles then tries to delete the file for the old object, but the dentry it has (ffff8800369faac8) no longer points to a valid inode for that directory entry, and so ext4_unlink() returns -EIO when de->inode does not match i_ino. [kslowd] <== cachefiles_bury_object() = -5 [kslowd] <== cachefiles_delete_object() = -5 [kslowd] <== fscache_object_state_machine() [->OBJECT_DEAD] [kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_AVAILABLE,0}) [kslowd] <== fscache_object_state_machine() [->OBJECT_ACTIVE] (Note that the above trace includes extra information beyond that produced by the upstream code). The fix is to note when an object that is being retired has had its object deleted preemptively by a replacement object that is being created, and to skip the second removal attempt in such a case. Reported-by: Greg M <gregm@servu.net.au> Reported-by: Mark Moseley <moseleymark@gmail.com> Reported-by: Romain DEGEZ <romain.degez@smartjog.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent 7d6fb7b
Kconfig
#
# For a description of the syntax of this configuration file,
# see Documentation/kbuild/kconfig-language.txt.
#
mainmenu "Linux Kernel Configuration"
config AVR32
def_bool y
# With EMBEDDED=n, we get lots of stuff automatically selected
# that we usually don't need on AVR32.
select EMBEDDED
select HAVE_CLK
select HAVE_OPROFILE
select HAVE_KPROBES
help
AVR32 is a high-performance 32-bit RISC microprocessor core,
designed for cost-sensitive embedded applications, with particular
emphasis on low power consumption and high code density.
There is an AVR32 Linux project with a web page at
http://avr32linux.org/.
config GENERIC_GPIO
def_bool y
config GENERIC_HARDIRQS
def_bool y
config STACKTRACE_SUPPORT
def_bool y
config LOCKDEP_SUPPORT
def_bool y
config TRACE_IRQFLAGS_SUPPORT
def_bool y
config HARDIRQS_SW_RESEND
def_bool y
config GENERIC_IRQ_PROBE
def_bool y
config RWSEM_GENERIC_SPINLOCK
def_bool y
config GENERIC_TIME
def_bool y
config GENERIC_CLOCKEVENTS
def_bool y
config RWSEM_XCHGADD_ALGORITHM
def_bool n
config ARCH_HAS_ILOG2_U32
def_bool n
config ARCH_HAS_ILOG2_U64
def_bool n
config GENERIC_HWEIGHT
def_bool y
config GENERIC_CALIBRATE_DELAY
def_bool y
config GENERIC_BUG
def_bool y
depends on BUG
source "init/Kconfig"
source "kernel/Kconfig.freezer"
menu "System Type and features"
source "kernel/time/Kconfig"
config SUBARCH_AVR32B
bool
config MMU
bool
config PERFORMANCE_COUNTERS
bool
config PLATFORM_AT32AP
bool
select SUBARCH_AVR32B
select MMU
select PERFORMANCE_COUNTERS
select ARCH_REQUIRE_GPIOLIB
select GENERIC_ALLOCATOR
select HAVE_FB_ATMEL
#
# CPU types
#
# AP7000 derivatives
config CPU_AT32AP700X
bool
select PLATFORM_AT32AP
config CPU_AT32AP7000
bool
select CPU_AT32AP700X
config CPU_AT32AP7001
bool
select CPU_AT32AP700X
config CPU_AT32AP7002
bool
select CPU_AT32AP700X
# AP700X boards
config BOARD_ATNGW100_COMMON
bool
select CPU_AT32AP7000
choice
prompt "AVR32 board type"
default BOARD_ATSTK1000
config BOARD_ATSTK1000
bool "ATSTK1000 evaluation board"
config BOARD_ATNGW100_MKI
bool "ATNGW100 Network Gateway"
select BOARD_ATNGW100_COMMON
config BOARD_ATNGW100_MKII
bool "ATNGW100 mkII Network Gateway"
select BOARD_ATNGW100_COMMON
config BOARD_HAMMERHEAD
bool "Hammerhead board"
select CPU_AT32AP7000
select USB_ARCH_HAS_HCD
help
The Hammerhead platform is built around an AVR32 32-bit microcontroller from Atmel.
It offers versatile peripherals, such as ethernet, usb device, usb host etc.
The board also incorporates a power supply and is a Power over Ethernet (PoE) Powered
Device (PD).
Additionally, a Cyclone III FPGA from Altera is integrated on the board. The FPGA is
mapped into the 32-bit AVR memory bus. The FPGA offers two DDR2 SDRAM interfaces, which
will cover even the most exceptional need of memory bandwidth. Together with the onboard
video decoder the board is ready for video processing.
For more information see: http://www.miromico.com/hammerhead
config BOARD_FAVR_32
bool "Favr-32 LCD-board"
select CPU_AT32AP7000
config BOARD_MERISC
bool "Merisc board"
select CPU_AT32AP7000
help
Merisc is the family name for a range of AVR32-based boards.
The boards are designed to be used in a man-machine
interfacing environment, utilizing a touch-based graphical
user interface. They host a vast range of I/O peripherals as
well as a large SDRAM & Flash memory bank.
For more information see: http://www.martinsson.se/merisc
config BOARD_MIMC200
bool "MIMC200 CPU board"
select CPU_AT32AP7000
endchoice
source "arch/avr32/boards/atstk1000/Kconfig"
source "arch/avr32/boards/atngw100/Kconfig"
source "arch/avr32/boards/hammerhead/Kconfig"
source "arch/avr32/boards/favr-32/Kconfig"
source "arch/avr32/boards/merisc/Kconfig"
choice
prompt "Boot loader type"
default LOADER_U_BOOT
config LOADER_U_BOOT
bool "U-Boot (or similar) bootloader"
endchoice
source "arch/avr32/mach-at32ap/Kconfig"
config LOAD_ADDRESS
hex
default 0x10000000 if LOADER_U_BOOT=y && CPU_AT32AP700X=y
config ENTRY_ADDRESS
hex
default 0x90000000 if LOADER_U_BOOT=y && CPU_AT32AP700X=y
config PHYS_OFFSET
hex
default 0x10000000 if CPU_AT32AP700X=y
source "kernel/Kconfig.preempt"
config QUICKLIST
def_bool y
config HAVE_ARCH_BOOTMEM
def_bool n
config ARCH_HAVE_MEMORY_PRESENT
def_bool n
config NEED_NODE_MEMMAP_SIZE
def_bool n
config ARCH_FLATMEM_ENABLE
def_bool y
config ARCH_DISCONTIGMEM_ENABLE
def_bool n
config ARCH_SPARSEMEM_ENABLE
def_bool n
source "mm/Kconfig"
config OWNERSHIP_TRACE
bool "Ownership trace support"
default y
help
Say Y to generate an Ownership Trace message on every context switch,
enabling Nexus-compliant debuggers to keep track of the PID of the
currently executing task.
config NMI_DEBUGGING
bool "NMI Debugging"
default n
help
Say Y here and pass the nmi_debug command-line parameter to
the kernel to turn on NMI debugging. Depending on the value
of the nmi_debug option, various pieces of information will
be dumped to the console when a Non-Maskable Interrupt
happens.
# FPU emulation goes here
source "kernel/Kconfig.hz"
config CMDLINE
string "Default kernel command line"
default ""
help
If you don't have a boot loader capable of passing a command line string
to the kernel, you may specify one here. As a minimum, you should specify
the memory size and the root device (e.g., mem=8M, root=/dev/nfs).
endmenu
menu "Power management options"
source "kernel/power/Kconfig"
config ARCH_SUSPEND_POSSIBLE
def_bool y
menu "CPU Frequency scaling"
source "drivers/cpufreq/Kconfig"
config CPU_FREQ_AT32AP
bool "CPU frequency driver for AT32AP"
depends on CPU_FREQ && PLATFORM_AT32AP
default n
help
This enables the CPU frequency driver for AT32AP processors.
For details, take a look in <file:Documentation/cpu-freq>.
If in doubt, say N.
endmenu
endmenu
menu "Bus options"
config PCI
bool
source "drivers/pci/Kconfig"
source "drivers/pcmcia/Kconfig"
endmenu
menu "Executable file formats"
source "fs/Kconfig.binfmt"
endmenu
source "net/Kconfig"
source "drivers/Kconfig"
source "fs/Kconfig"
source "arch/avr32/Kconfig.debug"
source "security/Kconfig"
source "crypto/Kconfig"
source "lib/Kconfig"
Computing file changes ...