Revision 3ad33b2436b545cbe8b28e53f3710432cad457ab authored by Lee Schermerhorn on 15 November 2007, 00:59:10 UTC, committed by Linus Torvalds on 15 November 2007, 02:45:38 UTC
We hit the BUG_ON() in mm/rmap.c:vma_address() when trying to migrate via
mbind(MPOL_MF_MOVE) a non-anon region that spans multiple vmas.  For
anon-regions, we just fail to migrate any pages beyond the 1st vma in the
range.

This occurs because do_mbind() collects a list of pages to migrate by
calling check_range().  check_range() walks the task's mm, spanning vmas as
necessary, to collect the migratable pages into a list.  Then, do_mbind()
calls migrate_pages() passing the list of pages, a function to allocate new
pages based on vma policy [new_vma_page()], and a pointer to the first vma
of the range.

For each page in the list, new_vma_page() calls page_address_in_vma()
passing the page and the vma [first in range] to obtain the address to get
for alloc_page_vma().  The page address is needed to get interleaving
policy correct.  If the pages in the list come from multiple vmas,
eventually, new_page_address() will pass that page to page_address_in_vma()
with the incorrect vma.  For !PageAnon pages, this will result in a bug
check in rmap.c:vma_address().  For anon pages, vma_address() will just
return EFAULT and fail the migration.

This patch modifies new_vma_page() to check the return value from
page_address_in_vma().  If the return value is EFAULT, new_vma_page()
searchs forward via vm_next for the vma that maps the page--i.e., that does
not return EFAULT.  This assumes that the pages in the list handed to
migrate_pages() is in address order.  This is currently case.  The patch
documents this assumption in a new comment block for new_vma_page().

If new_vma_page() cannot locate the vma mapping the page in a forward
search in the mm, it will pass a NULL vma to alloc_page_vma().  This will
result in the allocation using the task policy, if any, else system default
policy.  This situation is unlikely, but the patch documents this behavior
with a comment.

Note, this patch results in restarting from the first vma in a multi-vma
range each time new_vma_page() is called.  If this is not acceptable, we
can make the vma argument a pointer, both in new_vma_page() and it's caller
unmap_and_move() so that the value held by the loop in migrate_pages()
always passes down the last vma in which a page was found.  This will
require changes to all new_page_t functions passed to migrate_pages().  Is
this necessary?

For this patch to work, we can't bug check in vma_address() for pages
outside the argument vma.  This patch removes the BUG_ON().  All other
callers [besides new_vma_page()] already check the return status.

Tested on x86_64, 4 node NUMA platform.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent e1a1c99
History
File Mode Size
Kbuild -rw-r--r-- 62 bytes
a.out.h -rw-r--r-- 797 bytes
adb_iop.h -rw-r--r-- 1.1 KB
amigahw.h -rw-r--r-- 11.1 KB
amigaints.h -rw-r--r-- 3.5 KB
amigayle.h -rw-r--r-- 3.1 KB
amipcmcia.h -rw-r--r-- 2.5 KB
apollodma.h -rw-r--r-- 9.2 KB
apollohw.h -rw-r--r-- 2.8 KB
atafd.h -rw-r--r-- 261 bytes
atafdreg.h -rw-r--r-- 2.6 KB
atari_joystick.h -rw-r--r-- 418 bytes
atari_stdma.h -rw-r--r-- 458 bytes
atari_stram.h -rw-r--r-- 429 bytes
atarihw.h -rw-r--r-- 20.2 KB
atariints.h -rw-r--r-- 5.4 KB
atarikb.h -rw-r--r-- 1.5 KB
atomic.h -rw-r--r-- 4.0 KB
auxvec.h -rw-r--r-- 62 bytes
bitops.h -rw-r--r-- 9.8 KB
blinken.h -rw-r--r-- 617 bytes
bootinfo.h -rw-r--r-- 11.6 KB
bug.h -rw-r--r-- 492 bytes
bugs.h -rw-r--r-- 266 bytes
bvme6000hw.h -rw-r--r-- 3.4 KB
byteorder.h -rw-r--r-- 550 bytes
cache.h -rw-r--r-- 215 bytes
cachectl.h -rw-r--r-- 496 bytes
cacheflush.h -rw-r--r-- 4.1 KB
checksum.h -rw-r--r-- 3.3 KB
contregs.h -rw-r--r-- 112 bytes
cputime.h -rw-r--r-- 115 bytes
current.h -rw-r--r-- 135 bytes
delay.h -rw-r--r-- 1.3 KB
device.h -rw-r--r-- 129 bytes
div64.h -rw-r--r-- 710 bytes
dma-mapping.h -rw-r--r-- 2.5 KB
dma.h -rw-r--r-- 519 bytes
dsp56k.h -rw-r--r-- 1.2 KB
dvma.h -rw-r--r-- 9.8 KB
elf.h -rw-r--r-- 3.1 KB
emergency-restart.h -rw-r--r-- 149 bytes
entry.h -rw-r--r-- 2.8 KB
errno.h -rw-r--r-- 104 bytes
fb.h -rw-r--r-- 792 bytes
fbio.h -rw-r--r-- 28 bytes
fcntl.h -rw-r--r-- 313 bytes
floppy.h -rw-r--r-- 5.1 KB
fpu.h -rw-r--r-- 530 bytes
futex.h -rw-r--r-- 82 bytes
hardirq.h -rw-r--r-- 363 bytes
hp300hw.h -rw-r--r-- 982 bytes
hw_irq.h -rw-r--r-- 86 bytes
hwtest.h -rw-r--r-- 428 bytes
ide.h -rw-r--r-- 3.4 KB
idprom.h -rw-r--r-- 174 bytes
intersil.h -rw-r--r-- 1.1 KB
io.h -rw-r--r-- 11.9 KB
ioctl.h -rw-r--r-- 31 bytes
ioctls.h -rw-r--r-- 2.6 KB
ipcbuf.h -rw-r--r-- 631 bytes
irq.h -rw-r--r-- 3.5 KB
irq_regs.h -rw-r--r-- 34 bytes
kdebug.h -rw-r--r-- 32 bytes
kmap_types.h -rw-r--r-- 317 bytes
linkage.h -rw-r--r-- 113 bytes
local.h -rw-r--r-- 116 bytes
mac_asc.h -rw-r--r-- 481 bytes
mac_baboon.h -rw-r--r-- 852 bytes
mac_iop.h -rw-r--r-- 5.3 KB
mac_mouse.h -rw-r--r-- 433 bytes
mac_oss.h -rw-r--r-- 2.5 KB
mac_psc.h -rw-r--r-- 7.1 KB
mac_via.h -rw-r--r-- 11.1 KB
machdep.h -rw-r--r-- 1.2 KB
machines.h -rw-r--r-- 3.2 KB
machw.h -rw-r--r-- 2.5 KB
macintosh.h -rw-r--r-- 3.5 KB
macints.h -rw-r--r-- 4.0 KB
math-emu.h -rw-r--r-- 6.6 KB
mc146818rtc.h -rw-r--r-- 526 bytes
md.h -rw-r--r-- 249 bytes
mman.h -rw-r--r-- 616 bytes
mmu.h -rw-r--r-- 115 bytes
mmu_context.h -rw-r--r-- 3.3 KB
mmzone.h -rw-r--r-- 225 bytes
module.h -rw-r--r-- 814 bytes
motorola_pgalloc.h -rw-r--r-- 2.0 KB
motorola_pgtable.h -rw-r--r-- 9.5 KB
movs.h -rw-r--r-- 1.4 KB
msgbuf.h -rw-r--r-- 974 bytes
mutex.h -rw-r--r-- 308 bytes
mvme147hw.h -rw-r--r-- 2.8 KB
mvme16xhw.h -rw-r--r-- 2.5 KB
namei.h -rw-r--r-- 303 bytes
nubus.h -rw-r--r-- 1.2 KB
openprom.h -rw-r--r-- 8.0 KB
oplib.h -rw-r--r-- 9.6 KB
page.h -rw-r--r-- 5.4 KB
page_offset.h -rw-r--r-- 143 bytes
param.h -rw-r--r-- 438 bytes
parport.h -rw-r--r-- 793 bytes
pci.h -rw-r--r-- 1.1 KB
percpu.h -rw-r--r-- 123 bytes
pgalloc.h -rw-r--r-- 316 bytes
pgtable.h -rw-r--r-- 4.3 KB
poll.h -rw-r--r-- 134 bytes
posix_types.h -rw-r--r-- 1.9 KB
processor.h -rw-r--r-- 3.0 KB
ptrace.h -rw-r--r-- 1.7 KB
q40_master.h -rw-r--r-- 2.2 KB
q40ints.h -rw-r--r-- 808 bytes
raw_io.h -rw-r--r-- 7.9 KB
resource.h -rw-r--r-- 116 bytes
rtc.h -rw-r--r-- 1.8 KB
sbus.h -rw-r--r-- 1.1 KB
scatterlist.h -rw-r--r-- 507 bytes
sections.h -rw-r--r-- 128 bytes
segment.h -rw-r--r-- 1.1 KB
semaphore-helper.h -rw-r--r-- 2.7 KB
semaphore.h -rw-r--r-- 3.7 KB
sembuf.h -rw-r--r-- 696 bytes
serial.h -rw-r--r-- 1.1 KB
setup.h -rw-r--r-- 11.0 KB
shm.h -rw-r--r-- 1.0 KB
shmbuf.h -rw-r--r-- 1.1 KB
shmparam.h -rw-r--r-- 146 bytes
sigcontext.h -rw-r--r-- 488 bytes
siginfo.h -rw-r--r-- 2.0 KB
signal.h -rw-r--r-- 4.3 KB
socket.h -rw-r--r-- 1.2 KB
sockios.h -rw-r--r-- 370 bytes
spinlock.h -rw-r--r-- 94 bytes
stat.h -rw-r--r-- 1.6 KB
statfs.h -rw-r--r-- 108 bytes
string.h -rw-r--r-- 2.7 KB
sun3-head.h -rw-r--r-- 369 bytes
sun3_pgalloc.h -rw-r--r-- 2.1 KB
sun3_pgtable.h -rw-r--r-- 8.0 KB
sun3ints.h -rw-r--r-- 989 bytes
sun3mmu.h -rw-r--r-- 4.9 KB
sun3x.h -rw-r--r-- 829 bytes
sun3xflop.h -rw-r--r-- 5.6 KB
sun3xprom.h -rw-r--r-- 1.3 KB
suspend.h -rw-r--r-- 101 bytes
system.h -rw-r--r-- 5.0 KB
termbits.h -rw-r--r-- 4.5 KB
termios.h -rw-r--r-- 2.8 KB
thread_info.h -rw-r--r-- 1.9 KB
timex.h -rw-r--r-- 286 bytes
tlb.h -rw-r--r-- 447 bytes
tlbflush.h -rw-r--r-- 4.8 KB
topology.h -rw-r--r-- 128 bytes
traps.h -rw-r--r-- 8.3 KB
types.h -rw-r--r-- 1.4 KB
uaccess.h -rw-r--r-- 9.7 KB
ucontext.h -rw-r--r-- 531 bytes
unaligned.h -rw-r--r-- 390 bytes
unistd.h -rw-r--r-- 9.8 KB
user.h -rw-r--r-- 3.8 KB
virtconvert.h -rw-r--r-- 1.2 KB
xor.h -rw-r--r-- 29 bytes
zorro.h -rw-r--r-- 1.1 KB

back to top