Revision 3ad33b2436b545cbe8b28e53f3710432cad457ab authored by Lee Schermerhorn on 15 November 2007, 00:59:10 UTC, committed by Linus Torvalds on 15 November 2007, 02:45:38 UTC
We hit the BUG_ON() in mm/rmap.c:vma_address() when trying to migrate via
mbind(MPOL_MF_MOVE) a non-anon region that spans multiple vmas.  For
anon-regions, we just fail to migrate any pages beyond the 1st vma in the
range.

This occurs because do_mbind() collects a list of pages to migrate by
calling check_range().  check_range() walks the task's mm, spanning vmas as
necessary, to collect the migratable pages into a list.  Then, do_mbind()
calls migrate_pages() passing the list of pages, a function to allocate new
pages based on vma policy [new_vma_page()], and a pointer to the first vma
of the range.

For each page in the list, new_vma_page() calls page_address_in_vma()
passing the page and the vma [first in range] to obtain the address to get
for alloc_page_vma().  The page address is needed to get interleaving
policy correct.  If the pages in the list come from multiple vmas,
eventually, new_page_address() will pass that page to page_address_in_vma()
with the incorrect vma.  For !PageAnon pages, this will result in a bug
check in rmap.c:vma_address().  For anon pages, vma_address() will just
return EFAULT and fail the migration.

This patch modifies new_vma_page() to check the return value from
page_address_in_vma().  If the return value is EFAULT, new_vma_page()
searchs forward via vm_next for the vma that maps the page--i.e., that does
not return EFAULT.  This assumes that the pages in the list handed to
migrate_pages() is in address order.  This is currently case.  The patch
documents this assumption in a new comment block for new_vma_page().

If new_vma_page() cannot locate the vma mapping the page in a forward
search in the mm, it will pass a NULL vma to alloc_page_vma().  This will
result in the allocation using the task policy, if any, else system default
policy.  This situation is unlikely, but the patch documents this behavior
with a comment.

Note, this patch results in restarting from the first vma in a multi-vma
range each time new_vma_page() is called.  If this is not acceptable, we
can make the vma argument a pointer, both in new_vma_page() and it's caller
unmap_and_move() so that the value held by the loop in migrate_pages()
always passes down the last vma in which a page was found.  This will
require changes to all new_page_t functions passed to migrate_pages().  Is
this necessary?

For this patch to work, we can't bug check in vma_address() for pages
outside the argument vma.  This patch removes the BUG_ON().  All other
callers [besides new_vma_page()] already check the return status.

Tested on x86_64, 4 node NUMA platform.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent e1a1c99
History
File Mode Size
8xx_immap.h -rw-r--r-- 13.8 KB
amigayle.h -rw-r--r-- 31 bytes
amipcmcia.h -rw-r--r-- 32 bytes
bootinfo.h -rw-r--r-- 1.1 KB
bootx.h -rw-r--r-- 4.5 KB
btext.h -rw-r--r-- 905 bytes
commproc.h -rw-r--r-- 25.4 KB
cpm2.h -rw-r--r-- 52.7 KB
delay.h -rw-r--r-- 1.9 KB
device.h -rw-r--r-- 129 bytes
floppy.h -rw-r--r-- 4.0 KB
fs_pd.h -rw-r--r-- 752 bytes
gg2.h -rw-r--r-- 2.4 KB
gt64260.h -rw-r--r-- 9.7 KB
gt64260_defs.h -rw-r--r-- 37.1 KB
harrier.h -rw-r--r-- 1.2 KB
hawk.h -rw-r--r-- 1011 bytes
hawk_defs.h -rw-r--r-- 2.2 KB
highmem.h -rw-r--r-- 3.3 KB
hydra.h -rw-r--r-- 2.9 KB
ibm403.h -rw-r--r-- 17.1 KB
ibm405.h -rw-r--r-- 11.8 KB
ibm44x.h -rw-r--r-- 23.4 KB
ibm4xx.h -rw-r--r-- 2.3 KB
ibm_ocp.h -rw-r--r-- 6.9 KB
ibm_ocp_pci.h -rw-r--r-- 627 bytes
immap_85xx.h -rw-r--r-- 5.5 KB
immap_cpm2.h -rw-r--r-- 10.5 KB
io.h -rw-r--r-- 13.8 KB
irq_regs.h -rw-r--r-- 34 bytes
kdebug.h -rw-r--r-- 32 bytes
kgdb.h -rw-r--r-- 1.7 KB
m8260_pci.h -rw-r--r-- 5.9 KB
machdep.h -rw-r--r-- 4.9 KB
md.h -rw-r--r-- 246 bytes
mk48t59.h -rw-r--r-- 658 bytes
mmu.h -rw-r--r-- 15.1 KB
mmu_context.h -rw-r--r-- 5.7 KB
mpc10x.h -rw-r--r-- 6.8 KB
mpc52xx.h -rw-r--r-- 13.9 KB
mpc52xx_psc.h -rw-r--r-- 5.5 KB
mpc8260.h -rw-r--r-- 1.9 KB
mpc8260_pci9.h -rw-r--r-- 1.4 KB
mpc83xx.h -rw-r--r-- 3.7 KB
mpc85xx.h -rw-r--r-- 6.6 KB
mpc8xx.h -rw-r--r-- 2.5 KB
mv64x60.h -rw-r--r-- 11.5 KB
mv64x60_defs.h -rw-r--r-- 33.9 KB
ocp.h -rw-r--r-- 6.7 KB
ocp_ids.h -rw-r--r-- 1.8 KB
open_pic.h -rw-r--r-- 2.8 KB
page.h -rw-r--r-- 3.7 KB
pc_serial.h -rw-r--r-- 1.5 KB
pci-bridge.h -rw-r--r-- 4.5 KB
pci.h -rw-r--r-- 4.3 KB
pgalloc.h -rw-r--r-- 1.3 KB
pgtable.h -rw-r--r-- 29.2 KB
pnp.h -rw-r--r-- 28.0 KB
ppc4xx_dma.h -rw-r--r-- 18.9 KB
ppc4xx_pic.h -rw-r--r-- 1.7 KB
ppc_sys.h -rw-r--r-- 3.3 KB
ppcboot.h -rw-r--r-- 3.7 KB
prep_nvram.h -rw-r--r-- 4.6 KB
prom.h -rw-r--r-- 1.2 KB
raven.h -rw-r--r-- 973 bytes
reg_booke.h -rw-r--r-- 22.8 KB
residual.h -rw-r--r-- 14.8 KB
rio.h -rw-r--r-- 506 bytes
rtc.h -rw-r--r-- 2.2 KB
serial.h -rw-r--r-- 1.1 KB
smp.h -rw-r--r-- 1.8 KB
spinlock.h -rw-r--r-- 3.2 KB
suspend.h -rw-r--r-- 165 bytes
system.h -rw-r--r-- 6.8 KB
time.h -rw-r--r-- 3.8 KB
todc.h -rw-r--r-- 19.4 KB
traps.h -rw-r--r-- 28 bytes
zorro.h -rw-r--r-- 860 bytes

back to top