Revision e2093926a098a8ccf0f1d10f6df8dad452cb28d3 authored by Ross Zwisler on 02 June 2017, 21:46:37 UTC, committed by Linus Torvalds on 02 June 2017, 22:07:37 UTC
We currently have two related PMD vs PTE races in the DAX code.  These
can both be easily triggered by having two threads reading and writing
simultaneously to the same private mapping, with the key being that
private mapping reads can be handled with PMDs but private mapping
writes are always handled with PTEs so that we can COW.

Here is the first race:

  CPU 0					CPU 1

  (private mapping write)
  __handle_mm_fault()
    create_huge_pmd() - FALLBACK
    handle_pte_fault()
      passes check for pmd_devmap()

					(private mapping read)
					__handle_mm_fault()
					  create_huge_pmd()
					    dax_iomap_pmd_fault() inserts PMD

      dax_iomap_pte_fault() does a PTE fault, but we already have a DAX PMD
      			  installed in our page tables at this spot.

Here's the second race:

  CPU 0					CPU 1

  (private mapping read)
  __handle_mm_fault()
    passes check for pmd_none()
    create_huge_pmd()
      dax_iomap_pmd_fault() inserts PMD

  (private mapping write)
  __handle_mm_fault()
    create_huge_pmd() - FALLBACK
					(private mapping read)
					__handle_mm_fault()
					  passes check for pmd_none()
					  create_huge_pmd()

    handle_pte_fault()
      dax_iomap_pte_fault() inserts PTE
					    dax_iomap_pmd_fault() inserts PMD,
					       but we already have a PTE at
					       this spot.

The core of the issue is that while there is isolation between faults to
the same range in the DAX fault handlers via our DAX entry locking,
there is no isolation between faults in the code in mm/memory.c.  This
means for instance that this code in __handle_mm_fault() can run:

	if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) {
		ret = create_huge_pmd(&vmf);

But by the time we actually get to run the fault handler called by
create_huge_pmd(), the PMD is no longer pmd_none() because a racing PTE
fault has installed a normal PMD here as a parent.  This is the cause of
the 2nd race.  The first race is similar - there is the following check
in handle_pte_fault():

	} else {
		/* See comment in pte_alloc_one_map() */
		if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd))
			return 0;

So if a pmd_devmap() PMD (a DAX PMD) has been installed at vmf->pmd, we
will bail and retry the fault.  This is correct, but there is nothing
preventing the PMD from being installed after this check but before we
actually get to the DAX PTE fault handlers.

In my testing these races result in the following types of errors:

  BUG: Bad rss-counter state mm:ffff8800a817d280 idx:1 val:1
  BUG: non-zero nr_ptes on freeing mm: 15

Fix this issue by having the DAX fault handlers verify that it is safe
to continue their fault after they have taken an entry lock to block
other racing faults.

[ross.zwisler@linux.intel.com: improve fix for colliding PMD & PTE entries]
  Link: http://lkml.kernel.org/r/20170526195932.32178-1-ross.zwisler@linux.intel.com
Link: http://lkml.kernel.org/r/20170522215749.23516-2-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Pawel Lebioda <pawel.lebioda@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Pawel Lebioda <pawel.lebioda@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Xiong Zhou <xzhou@redhat.com>
Cc: Eryu Guan <eguan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent d0f0931
Raw File
coccicheck
#!/bin/bash
# Linux kernel coccicheck
#
# Read Documentation/dev-tools/coccinelle.rst
#
# This script requires at least spatch
# version 1.0.0-rc11.

DIR="$(dirname $(readlink -f $0))/.."
SPATCH="`which ${SPATCH:=spatch}`"

if [ ! -x "$SPATCH" ]; then
    echo 'spatch is part of the Coccinelle project and is available at http://coccinelle.lip6.fr/'
    exit 1
fi

SPATCH_VERSION=$($SPATCH --version | head -1 | awk '{print $3}')
SPATCH_VERSION_NUM=$(echo $SPATCH_VERSION | ${DIR}/scripts/ld-version.sh)

USE_JOBS="no"
$SPATCH --help | grep "\-\-jobs" > /dev/null && USE_JOBS="yes"

# The verbosity may be set by the environmental parameter V=
# as for example with 'make V=1 coccicheck'

if [ -n "$V" -a "$V" != "0" ]; then
	VERBOSE="$V"
else
	VERBOSE=0
fi

if [ -z "$J" ]; then
	NPROC=$(getconf _NPROCESSORS_ONLN)
else
	NPROC="$J"
fi

FLAGS="--very-quiet"

# You can use SPFLAGS to append extra arguments to coccicheck or override any
# heuristics done in this file as Coccinelle accepts the last options when
# options conflict.
#
# A good example for use of SPFLAGS is if you want to debug your cocci script,
# you can for instance use the following:
#
# $ export COCCI=scripts/coccinelle/misc/irqf_oneshot.cocci
# $ make coccicheck MODE=report DEBUG_FILE="all.err" SPFLAGS="--profile --show-trying" M=./drivers/mfd/arizona-irq.c
#
# "--show-trying" should show you what rule is being processed as it goes to
# stdout, you do not need a debug file for that. The profile output will be
# be sent to stdout, if you provide a DEBUG_FILE the profiling data can be
# inspected there.
#
# --profile will not output if --very-quiet is used, so avoid it.
echo $SPFLAGS | egrep -e "--profile|--show-trying" 2>&1 > /dev/null
if [ $? -eq 0 ]; then
	FLAGS="--quiet"
fi

# spatch only allows include directories with the syntax "-I include"
# while gcc also allows "-Iinclude" and "-include include"
COCCIINCLUDE=${LINUXINCLUDE//-I/-I }
COCCIINCLUDE=${COCCIINCLUDE// -include/ --include}

if [ "$C" = "1" -o "$C" = "2" ]; then
    ONLINE=1

    # Take only the last argument, which is the C file to test
    shift $(( $# - 1 ))
    OPTIONS="$COCCIINCLUDE $1"
else
    ONLINE=0
    if [ "$KBUILD_EXTMOD" = "" ] ; then
        OPTIONS="--dir $srctree $COCCIINCLUDE"
    else
        OPTIONS="--dir $KBUILD_EXTMOD $COCCIINCLUDE"
    fi
fi

if [ "$KBUILD_EXTMOD" != "" ] ; then
    OPTIONS="--patch $srctree $OPTIONS"
fi

# You can override by using SPFLAGS
if [ "$USE_JOBS" = "no" ]; then
	trap kill_running SIGTERM SIGINT
	declare -a SPATCH_PID
elif [ "$NPROC" != "1" ]; then
	# Using 0 should work as well, refer to _SC_NPROCESSORS_ONLN use on
	# https://github.com/rdicosmo/parmap/blob/master/setcore_stubs.c
	OPTIONS="$OPTIONS --jobs $NPROC --chunksize 1"
fi

if [ "$MODE" = "" ] ; then
    if [ "$ONLINE" = "0" ] ; then
	echo 'You have not explicitly specified the mode to use. Using default "report" mode.'
	echo 'Available modes are the following: patch, report, context, org'
	echo 'You can specify the mode with "make coccicheck MODE=<mode>"'
	echo 'Note however that some modes are not implemented by some semantic patches.'
    fi
    MODE="report"
fi

if [ "$MODE" = "chain" ] ; then
    if [ "$ONLINE" = "0" ] ; then
	echo 'You have selected the "chain" mode.'
	echo 'All available modes will be tried (in that order): patch, report, context, org'
    fi
elif [ "$MODE" = "report" -o "$MODE" = "org" ] ; then
    FLAGS="--no-show-diff $FLAGS"
fi

if [ "$ONLINE" = "0" ] ; then
    echo ''
    echo 'Please check for false positives in the output before submitting a patch.'
    echo 'When using "patch" mode, carefully review the patch before submitting it.'
    echo ''
fi

run_cmd_parmap() {
	if [ $VERBOSE -ne 0 ] ; then
		echo "Running ($NPROC in parallel): $@"
	fi
	if [ "$DEBUG_FILE" != "/dev/null" -a "$DEBUG_FILE" != "" ]; then
		if [ -f $DEBUG_FILE ]; then
			echo "Debug file $DEBUG_FILE exists, bailing"
			exit
		fi
	else
		DEBUG_FILE="/dev/null"
	fi
	$@ 2>$DEBUG_FILE
	if [[ $? -ne 0 ]]; then
		echo "coccicheck failed"
		exit $?
	fi
}

run_cmd_old() {
	local i
	if [ $VERBOSE -ne 0 ] ; then
		echo "Running ($NPROC in parallel): $@"
	fi
	for i in $(seq 0 $(( NPROC - 1)) ); do
		eval "$@ --max $NPROC --index $i &"
		SPATCH_PID[$i]=$!
		if [ $VERBOSE -eq 2 ] ; then
			echo "${SPATCH_PID[$i]} running"
		fi
	done
	wait
}

run_cmd() {
	if [ "$USE_JOBS" = "yes" ]; then
		run_cmd_parmap $@
	else
		run_cmd_old $@
	fi
}

kill_running() {
	for i in $(seq 0 $(( NPROC - 1 )) ); do
		if [ $VERBOSE -eq 2 ] ; then
			echo "Killing ${SPATCH_PID[$i]}"
		fi
		kill ${SPATCH_PID[$i]} 2>/dev/null
	done
}

# You can override heuristics with SPFLAGS, these must always go last
OPTIONS="$OPTIONS $SPFLAGS"

coccinelle () {
    COCCI="$1"

    OPT=`grep "Option" $COCCI | cut -d':' -f2`
    REQ=`grep "Requires" $COCCI | cut -d':' -f2 | sed "s| ||"`
    REQ_NUM=$(echo $REQ | ${DIR}/scripts/ld-version.sh)
    if [ "$REQ_NUM" != "0" ] ; then
	    if [ "$SPATCH_VERSION_NUM" -lt "$REQ_NUM" ] ; then
		    echo "Skipping coccinele SmPL patch: $COCCI"
		    echo "You have coccinelle:           $SPATCH_VERSION"
		    echo "This SmPL patch requires:      $REQ"
		    return
	    fi
    fi

#   The option '--parse-cocci' can be used to syntactically check the SmPL files.
#
#    $SPATCH -D $MODE $FLAGS -parse_cocci $COCCI $OPT > /dev/null

    if [ $VERBOSE -ne 0 -a $ONLINE -eq 0 ] ; then

	FILE=`echo $COCCI | sed "s|$srctree/||"`

	echo "Processing `basename $COCCI`"
	echo "with option(s) \"$OPT\""
	echo ''
	echo 'Message example to submit a patch:'

	sed -ne 's|^///||p' $COCCI

	if [ "$MODE" = "patch" ] ; then
	    echo ' The semantic patch that makes this change is available'
	elif [ "$MODE" = "report" ] ; then
	    echo ' The semantic patch that makes this report is available'
	elif [ "$MODE" = "context" ] ; then
	    echo ' The semantic patch that spots this code is available'
	elif [ "$MODE" = "org" ] ; then
	    echo ' The semantic patch that makes this Org report is available'
	else
	    echo ' The semantic patch that makes this output is available'
	fi
	echo " in $FILE."
	echo ''
	echo ' More information about semantic patching is available at'
	echo ' http://coccinelle.lip6.fr/'
	echo ''

	if [ "`sed -ne 's|^//#||p' $COCCI`" ] ; then
	    echo 'Semantic patch information:'
	    sed -ne 's|^//#||p' $COCCI
	    echo ''
	fi
    fi

    if [ "$MODE" = "chain" ] ; then
	run_cmd $SPATCH -D patch   \
		$FLAGS --cocci-file $COCCI $OPT $OPTIONS               || \
	run_cmd $SPATCH -D report  \
		$FLAGS --cocci-file $COCCI $OPT $OPTIONS --no-show-diff || \
	run_cmd $SPATCH -D context \
		$FLAGS --cocci-file $COCCI $OPT $OPTIONS               || \
	run_cmd $SPATCH -D org     \
		$FLAGS --cocci-file $COCCI $OPT $OPTIONS --no-show-diff || exit 1
    elif [ "$MODE" = "rep+ctxt" ] ; then
	run_cmd $SPATCH -D report  \
		$FLAGS --cocci-file $COCCI $OPT $OPTIONS --no-show-diff && \
	run_cmd $SPATCH -D context \
		$FLAGS --cocci-file $COCCI $OPT $OPTIONS || exit 1
    else
	run_cmd $SPATCH -D $MODE   $FLAGS --cocci-file $COCCI $OPT $OPTIONS || exit 1
    fi

}

if [ "$COCCI" = "" ] ; then
    for f in `find $srctree/scripts/coccinelle/ -name '*.cocci' -type f | sort`; do
	coccinelle $f
    done
else
    coccinelle $COCCI
fi
back to top