Revision 0f640dca08330dfc7820d610578e5935b5e654b2 authored by Mike Snitzer on 31 January 2013, 14:11:14 UTC, committed by Alasdair G Kergon on 31 January 2013, 14:11:14 UTC
thin_io_hints() is blindly copying the queue limits from the thin-pool which can lead to incorrect limits being set. The fix here simply deletes the thin_io_hints() hook which leaves the existing stacking infrastructure to set the limits correctly. When a thin-pool uses an MD device for the data device a thin device from the thin-pool must respect MD's constraints about disallowing a bio from spanning multiple chunks. Otherwise we can see problems. If the raid0 chunksize is 1152K and thin-pool chunksize is 256K I see the following md/raid0 error (with extra debug tracing added to thin_endio) when mkfs.xfs is executed against the thin device: md/raid0:md99: make_request bug: can't convert block across chunks or bigger than 1152k 6688 127 device-mapper: thin: bio sector=2080 err=-5 bi_size=130560 bi_rw=17 bi_vcnt=32 bi_idx=0 This extra DM debugging shows that the failing bio is spanning across the first and second logical 1152K chunk (sector 2080 + 255 takes the bio beyond the first chunk's boundary of sector 2304). So the bio splitting that DM is doing clearly isn't respecting the MD limits. max_hw_sectors_kb is 127 for both the thin-pool and thin device (queue_max_hw_sectors returns 255 so we'll excuse sysfs's lack of precision). So this explains why bi_size is 130560. But the thin device's max_hw_sectors_kb should be 4 (PAGE_SIZE) given that it doesn't have a .merge function (for bio_add_page to consult indirectly via dm_merge_bvec) yet the thin-pool does sit above an MD device that has a compulsory merge_bvec_fn. This scenario is exactly why DM must resort to sending single PAGE_SIZE bios to the underlying layer. Some additional context for this is available in the header for commit 8cbeb67a ("dm: avoid unsupported spanning of md stripe boundaries"). Long story short, the reason a thin device doesn't properly get configured to have a max_hw_sectors_kb of 4 (PAGE_SIZE) is that thin_io_hints() is blindly copying the queue limits from the thin-pool device directly to the thin device's queue limits. Fix this by eliminating thin_io_hints. Doing so is safe because the block layer's queue limits stacking already enables the upper level thin device to inherit the thin-pool device's discard and minimum_io_size and optimal_io_size limits that get set in pool_io_hints. But avoiding the queue limits copy allows the thin and thin-pool limits to be different where it is important, namely max_hw_sectors_kb. Reported-by: Daniel Browning <db@kavod.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
1 parent 949db15
File | Mode | Size |
---|---|---|
Kconfig | -rw-r--r-- | 15.0 KB |
Kconfig.debug | -rw-r--r-- | 1015 bytes |
Makefile | -rw-r--r-- | 2.0 KB |
backing-dev.c | -rw-r--r-- | 21.2 KB |
balloon_compaction.c | -rw-r--r-- | 9.6 KB |
bootmem.c | -rw-r--r-- | 21.1 KB |
bounce.c | -rw-r--r-- | 6.6 KB |
cleancache.c | -rw-r--r-- | 6.5 KB |
compaction.c | -rw-r--r-- | 32.4 KB |
debug-pagealloc.c | -rw-r--r-- | 2.1 KB |
dmapool.c | -rw-r--r-- | 13.1 KB |
fadvise.c | -rw-r--r-- | 3.6 KB |
failslab.c | -rw-r--r-- | 1.3 KB |
filemap.c | -rw-r--r-- | 67.3 KB |
filemap_xip.c | -rw-r--r-- | 11.3 KB |
fremap.c | -rw-r--r-- | 6.8 KB |
frontswap.c | -rw-r--r-- | 10.3 KB |
highmem.c | -rw-r--r-- | 9.9 KB |
huge_memory.c | -rw-r--r-- | 73.7 KB |
hugetlb.c | -rw-r--r-- | 82.4 KB |
hugetlb_cgroup.c | -rw-r--r-- | 10.7 KB |
hwpoison-inject.c | -rw-r--r-- | 3.3 KB |
init-mm.c | -rw-r--r-- | 619 bytes |
internal.h | -rw-r--r-- | 11.1 KB |
interval_tree.c | -rw-r--r-- | 3.2 KB |
kmemcheck.c | -rw-r--r-- | 2.8 KB |
kmemleak-test.c | -rw-r--r-- | 3.3 KB |
kmemleak.c | -rw-r--r-- | 52.5 KB |
ksm.c | -rw-r--r-- | 55.2 KB |
maccess.c | -rw-r--r-- | 1.6 KB |
madvise.c | -rw-r--r-- | 11.9 KB |
memblock.c | -rw-r--r-- | 29.1 KB |
memcontrol.c | -rw-r--r-- | 178.6 KB |
memory-failure.c | -rw-r--r-- | 42.3 KB |
memory.c | -rw-r--r-- | 113.9 KB |
memory_hotplug.c | -rw-r--r-- | 35.7 KB |
mempolicy.c | -rw-r--r-- | 70.9 KB |
mempool.c | -rw-r--r-- | 10.5 KB |
migrate.c | -rw-r--r-- | 44.3 KB |
mincore.c | -rw-r--r-- | 7.8 KB |
mlock.c | -rw-r--r-- | 15.5 KB |
mm_init.c | -rw-r--r-- | 3.7 KB |
mmap.c | -rw-r--r-- | 80.6 KB |
mmu_context.c | -rw-r--r-- | 1.4 KB |
mmu_notifier.c | -rw-r--r-- | 9.4 KB |
mmzone.c | -rw-r--r-- | 1.9 KB |
mprotect.c | -rw-r--r-- | 10.2 KB |
mremap.c | -rw-r--r-- | 14.3 KB |
msync.c | -rw-r--r-- | 2.4 KB |
nobootmem.c | -rw-r--r-- | 11.2 KB |
nommu.c | -rw-r--r-- | 51.3 KB |
oom_kill.c | -rw-r--r-- | 19.4 KB |
page-writeback.c | -rw-r--r-- | 69.1 KB |
page_alloc.c | -rw-r--r-- | 169.5 KB |
page_cgroup.c | -rw-r--r-- | 11.9 KB |
page_io.c | -rw-r--r-- | 6.8 KB |
page_isolation.c | -rw-r--r-- | 7.0 KB |
pagewalk.c | -rw-r--r-- | 5.7 KB |
percpu-km.c | -rw-r--r-- | 2.8 KB |
percpu-vm.c | -rw-r--r-- | 12.9 KB |
percpu.c | -rw-r--r-- | 57.1 KB |
pgtable-generic.c | -rw-r--r-- | 4.6 KB |
process_vm_access.c | -rw-r--r-- | 13.3 KB |
quicklist.c | -rw-r--r-- | 2.4 KB |
readahead.c | -rw-r--r-- | 16.1 KB |
rmap.c | -rw-r--r-- | 51.6 KB |
shmem.c | -rw-r--r-- | 76.8 KB |
slab.c | -rw-r--r-- | 117.7 KB |
slab.h | -rw-r--r-- | 6.2 KB |
slab_common.c | -rw-r--r-- | 11.1 KB |
slob.c | -rw-r--r-- | 15.3 KB |
slub.c | -rw-r--r-- | 129.0 KB |
sparse-vmemmap.c | -rw-r--r-- | 5.9 KB |
sparse.c | -rw-r--r-- | 20.7 KB |
swap.c | -rw-r--r-- | 23.1 KB |
swap_state.c | -rw-r--r-- | 10.3 KB |
swapfile.c | -rw-r--r-- | 63.1 KB |
truncate.c | -rw-r--r-- | 18.3 KB |
util.c | -rw-r--r-- | 9.1 KB |
vmalloc.c | -rw-r--r-- | 66.0 KB |
vmscan.c | -rw-r--r-- | 99.7 KB |
vmstat.c | -rw-r--r-- | 33.9 KB |
![swh spinner](/static/img/swh-spinner.gif)
Computing file changes ...