Revision 5b77e95dd7790ff6c8fbf1cd8d0104ebed818a03 authored by Alexander Potapenko on 02 April 2019, 11:28:13 UTC, committed by Ingo Molnar on 06 April 2019, 07:52:02 UTC
There's a number of problems with how arch/x86/include/asm/bitops.h
is currently using assembly constraints for the memory region
bitops are modifying:

1) Use memory clobber in bitops that touch arbitrary memory

Certain bit operations that read/write bits take a base pointer and an
arbitrarily large offset to address the bit relative to that base.
Inline assembly constraints aren't expressive enough to tell the
compiler that the assembly directive is going to touch a specific memory
location of unknown size, therefore we have to use the "memory" clobber
to indicate that the assembly is going to access memory locations other
than those listed in the inputs/outputs.

To indicate that BTR/BTS instructions don't necessarily touch the first
sizeof(long) bytes of the argument, we also move the address to assembly
inputs.

This particular change leads to size increase of 124 kernel functions in
a defconfig build. For some of them the diff is in NOP operations, other
end up re-reading values from memory and may potentially slow down the
execution. But without these clobbers the compiler is free to cache
the contents of the bitmaps and use them as if they weren't changed by
the inline assembly.

2) Use byte-sized arguments for operations touching single bytes.

Passing a long value to ANDB/ORB/XORB instructions makes the compiler
treat sizeof(long) bytes as being clobbered, which isn't the case. This
may theoretically lead to worse code in the case of heavy optimization.

Practical impact:

I've built a defconfig kernel and looked through some of the functions
generated by GCC 7.3.0 with and without this clobber, and didn't spot
any miscompilations.

However there is a (trivial) theoretical case where this code leads to
miscompilation:

  https://lkml.org/lkml/2019/3/28/393

using just GCC 8.3.0 with -O2.  It isn't hard to imagine someone writes
such a function in the kernel someday.

So the primary motivation is to fix an existing misuse of the asm
directive, which happens to work in certain configurations now, but
isn't guaranteed to work under different circumstances.

[ --mingo: Added -stable tag because defconfig only builds a fraction
  of the kernel and the trivial testcase looks normal enough to
  be used in existing or in-development code. ]

Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: James Y Knight <jyknight@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190402112813.193378-1-glider@google.com
[ Edited the changelog, tidied up one of the defines. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
1 parent faa3604
Raw File
sync_file.txt
===================
Sync File API Guide
===================

:Author: Gustavo Padovan <gustavo at padovan dot org>

This document serves as a guide for device drivers writers on what the
sync_file API is, and how drivers can support it. Sync file is the carrier of
the fences(struct dma_fence) that are needed to synchronize between drivers or
across process boundaries.

The sync_file API is meant to be used to send and receive fence information
to/from userspace. It enables userspace to do explicit fencing, where instead
of attaching a fence to the buffer a producer driver (such as a GPU or V4L
driver) sends the fence related to the buffer to userspace via a sync_file.

The sync_file then can be sent to the consumer (DRM driver for example), that
will not use the buffer for anything before the fence(s) signals, i.e., the
driver that issued the fence is not using/processing the buffer anymore, so it
signals that the buffer is ready to use. And vice-versa for the consumer ->
producer part of the cycle.

Sync files allows userspace awareness on buffer sharing synchronization between
drivers.

Sync file was originally added in the Android kernel but current Linux Desktop
can benefit a lot from it.

in-fences and out-fences
------------------------

Sync files can go either to or from userspace. When a sync_file is sent from
the driver to userspace we call the fences it contains 'out-fences'. They are
related to a buffer that the driver is processing or is going to process, so
the driver creates an out-fence to be able to notify, through
dma_fence_signal(), when it has finished using (or processing) that buffer.
Out-fences are fences that the driver creates.

On the other hand if the driver receives fence(s) through a sync_file from
userspace we call these fence(s) 'in-fences'. Receiving in-fences means that
we need to wait for the fence(s) to signal before using any buffer related to
the in-fences.

Creating Sync Files
-------------------

When a driver needs to send an out-fence userspace it creates a sync_file.

Interface::

	struct sync_file *sync_file_create(struct dma_fence *fence);

The caller pass the out-fence and gets back the sync_file. That is just the
first step, next it needs to install an fd on sync_file->file. So it gets an
fd::

	fd = get_unused_fd_flags(O_CLOEXEC);

and installs it on sync_file->file::

	fd_install(fd, sync_file->file);

The sync_file fd now can be sent to userspace.

If the creation process fail, or the sync_file needs to be released by any
other reason fput(sync_file->file) should be used.

Receiving Sync Files from Userspace
-----------------------------------

When userspace needs to send an in-fence to the driver it passes file descriptor
of the Sync File to the kernel. The kernel can then retrieve the fences
from it.

Interface::

	struct dma_fence *sync_file_get_fence(int fd);


The returned reference is owned by the caller and must be disposed of
afterwards using dma_fence_put(). In case of error, a NULL is returned instead.

References:

1. struct sync_file in include/linux/sync_file.h
2. All interfaces mentioned above defined in include/linux/sync_file.h
back to top