https://github.com/torvalds/linux
Revision 80fa46d6b9e7b1527bfd2197d75431fd9c382161 authored by Theodore Ts'o on 01 September 2022, 22:03:14 UTC, committed by Theodore Ts'o on 22 September 2022, 14:51:19 UTC
This patch avoids threads live-locking for hours when a large number threads are competing over the last few free extents as they blocks getting added and removed from preallocation pools. From our bug reporter: A reliable way for triggering this has multiple writers continuously write() to files when the filesystem is full, while small amounts of space are freed (e.g. by truncating a large file -1MiB at a time). In the local filesystem, this can be done by simply not checking the return code of write (0) and/or the error (ENOSPACE) that is set. Over NFS with an async mount, even clients with proper error checking will behave this way since the linux NFS client implementation will not propagate the server errors [the write syscalls immediately return success] until the file handle is closed. This leads to a situation where NFS clients send a continuous stream of WRITE rpcs which result in ERRNOSPACE -- but since the client isn't seeing this, the stream of writes continues at maximum network speed. When some space does appear, multiple writers will all attempt to claim it for their current write. For NFS, we may see dozens to hundreds of threads that do this. The real-world scenario of this is database backup tooling (in particular, github.com/mdkent/percona-xtrabackup) which may write large files (>1TiB) to NFS for safe keeping. Some temporary files are written, rewound, and read back -- all before closing the file handle (the temp file is actually unlinked, to trigger automatic deletion on close/crash.) An application like this operating on an async NFS mount will not see an error code until TiB have been written/read. The lockup was observed when running this database backup on large filesystems (64 TiB in this case) with a high number of block groups and no free space. Fragmentation is generally not a factor in this filesystem (~thousands of large files, mostly contiguous except for the parts written while the filesystem is at capacity.) Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
1 parent 29a5b8a
Tip revision: 80fa46d6b9e7b1527bfd2197d75431fd9c382161 authored by Theodore Ts'o on 01 September 2022, 22:03:14 UTC
ext4: limit the number of retries after discarding preallocations blocks
ext4: limit the number of retries after discarding preallocations blocks
Tip revision: 80fa46d
File | Mode | Size |
---|---|---|
Documentation | ||
LICENSES | ||
arch | ||
block | ||
certs | ||
crypto | ||
drivers | ||
fs | ||
include | ||
init | ||
io_uring | ||
ipc | ||
kernel | ||
lib | ||
mm | ||
net | ||
samples | ||
scripts | ||
security | ||
sound | ||
tools | ||
usr | ||
virt | ||
.clang-format | -rw-r--r-- | 19.9 KB |
.cocciconfig | -rw-r--r-- | 59 bytes |
.get_maintainer.ignore | -rw-r--r-- | 151 bytes |
.gitattributes | -rw-r--r-- | 62 bytes |
.gitignore | -rw-r--r-- | 1.9 KB |
.mailmap | -rw-r--r-- | 23.7 KB |
COPYING | -rw-r--r-- | 496 bytes |
CREDITS | -rw-r--r-- | 99.1 KB |
Kbuild | -rw-r--r-- | 1.3 KB |
Kconfig | -rw-r--r-- | 555 bytes |
MAINTAINERS | -rw-r--r-- | 663.6 KB |
Makefile | -rw-r--r-- | 64.5 KB |
README | -rw-r--r-- | 727 bytes |
Computing file changes ...