Revision a9ce385344f916cd1c36a33905e564f5581beae9 authored by Jens Axboe on 15 September 2023, 19:14:23 UTC, committed by Mike Snitzer on 15 September 2023, 19:39:59 UTC
dm looks up the table for IO based on the request type, with an
assumption that if the request is marked REQ_NOWAIT, it's fine to
attempt to submit that IO while under RCU read lock protection. This
is not OK, as REQ_NOWAIT just means that we should not be sleeping
waiting on other IO, it does not mean that we can't potentially
schedule.

A simple test case demonstrates this quite nicely:

int main(int argc, char *argv[])
{
        struct iovec iov;
        int fd;

        fd = open("/dev/dm-0", O_RDONLY | O_DIRECT);
        posix_memalign(&iov.iov_base, 4096, 4096);
        iov.iov_len = 4096;
        preadv2(fd, &iov, 1, 0, RWF_NOWAIT);
        return 0;
}

which will instantly spew:

BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 5580, name: dm-nowait
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
INFO: lockdep is turned off.
CPU: 7 PID: 5580 Comm: dm-nowait Not tainted 6.6.0-rc1-g39956d2dcd81 #132
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x11d/0x1b0
 __might_resched+0x3c3/0x5e0
 ? preempt_count_sub+0x150/0x150
 mempool_alloc+0x1e2/0x390
 ? mempool_resize+0x7d0/0x7d0
 ? lock_sync+0x190/0x190
 ? lock_release+0x4b7/0x670
 ? internal_get_user_pages_fast+0x868/0x2d40
 bio_alloc_bioset+0x417/0x8c0
 ? bvec_alloc+0x200/0x200
 ? internal_get_user_pages_fast+0xb8c/0x2d40
 bio_alloc_clone+0x53/0x100
 dm_submit_bio+0x27f/0x1a20
 ? lock_release+0x4b7/0x670
 ? blk_try_enter_queue+0x1a0/0x4d0
 ? dm_dax_direct_access+0x260/0x260
 ? rcu_is_watching+0x12/0xb0
 ? blk_try_enter_queue+0x1cc/0x4d0
 __submit_bio+0x239/0x310
 ? __bio_queue_enter+0x700/0x700
 ? kvm_clock_get_cycles+0x40/0x60
 ? ktime_get+0x285/0x470
 submit_bio_noacct_nocheck+0x4d9/0xb80
 ? should_fail_request+0x80/0x80
 ? preempt_count_sub+0x150/0x150
 ? lock_release+0x4b7/0x670
 ? __bio_add_page+0x143/0x2d0
 ? iov_iter_revert+0x27/0x360
 submit_bio_noacct+0x53e/0x1b30
 submit_bio_wait+0x10a/0x230
 ? submit_bio_wait_endio+0x40/0x40
 __blkdev_direct_IO_simple+0x4f8/0x780
 ? blkdev_bio_end_io+0x4c0/0x4c0
 ? stack_trace_save+0x90/0xc0
 ? __bio_clone+0x3c0/0x3c0
 ? lock_release+0x4b7/0x670
 ? lock_sync+0x190/0x190
 ? atime_needs_update+0x3bf/0x7e0
 ? timestamp_truncate+0x21b/0x2d0
 ? inode_owner_or_capable+0x240/0x240
 blkdev_direct_IO.part.0+0x84a/0x1810
 ? rcu_is_watching+0x12/0xb0
 ? lock_release+0x4b7/0x670
 ? blkdev_read_iter+0x40d/0x530
 ? reacquire_held_locks+0x4e0/0x4e0
 ? __blkdev_direct_IO_simple+0x780/0x780
 ? rcu_is_watching+0x12/0xb0
 ? __mark_inode_dirty+0x297/0xd50
 ? preempt_count_add+0x72/0x140
 blkdev_read_iter+0x2a4/0x530
 do_iter_readv_writev+0x2f2/0x3c0
 ? generic_copy_file_range+0x1d0/0x1d0
 ? fsnotify_perm.part.0+0x25d/0x630
 ? security_file_permission+0xd8/0x100
 do_iter_read+0x31b/0x880
 ? import_iovec+0x10b/0x140
 vfs_readv+0x12d/0x1a0
 ? vfs_iter_read+0xb0/0xb0
 ? rcu_is_watching+0x12/0xb0
 ? rcu_is_watching+0x12/0xb0
 ? lock_release+0x4b7/0x670
 do_preadv+0x1b3/0x260
 ? do_readv+0x370/0x370
 __x64_sys_preadv2+0xef/0x150
 do_syscall_64+0x39/0xb0
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f5af41ad806
Code: 41 54 41 89 fc 55 44 89 c5 53 48 89 cb 48 83 ec 18 80 3d e4 dd 0d 00 00 74 7a 45 89 c1 49 89 ca 45 31 c0 b8 47 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 be 00 00 00 48 85 c0 79 4a 48 8b 0d da 55
RSP: 002b:00007ffd3145c7f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000147
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5af41ad806
RDX: 0000000000000001 RSI: 00007ffd3145c850 RDI: 0000000000000003
RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000008
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
R13: 00007ffd3145c850 R14: 000055f5f0431dd8 R15: 0000000000000001
 </TASK>

where in fact it is dm itself that attempts to allocate a bio clone with
GFP_NOIO under the rcu read lock, regardless of the request type.

Fix this by getting rid of the special casing for REQ_NOWAIT, and just
use the normal SRCU protected table lookup. Get rid of the bio based
table locking helpers at the same time, as they are now unused.

Cc: stable@vger.kernel.org
Fixes: 563a225c9fd2 ("dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
1 parent f6007dc
Raw File
hidraw.rst
================================================================
HIDRAW - Raw Access to USB and Bluetooth Human Interface Devices
================================================================

The hidraw driver provides a raw interface to USB and Bluetooth Human
Interface Devices (HIDs).  It differs from hiddev in that reports sent and
received are not parsed by the HID parser, but are sent to and received from
the device unmodified.

Hidraw should be used if the userspace application knows exactly how to
communicate with the hardware device, and is able to construct the HID
reports manually.  This is often the case when making userspace drivers for
custom HID devices.

Hidraw is also useful for communicating with non-conformant HID devices
which send and receive data in a way that is inconsistent with their report
descriptors.  Because hiddev parses reports which are sent and received
through it, checking them against the device's report descriptor, such
communication with these non-conformant devices is impossible using hiddev.
Hidraw is the only alternative, short of writing a custom kernel driver, for
these non-conformant devices.

A benefit of hidraw is that its use by userspace applications is independent
of the underlying hardware type.  Currently, hidraw is implemented for USB
and Bluetooth.  In the future, as new hardware bus types are developed which
use the HID specification, hidraw will be expanded to add support for these
new bus types.

Hidraw uses a dynamic major number, meaning that udev should be relied on to
create hidraw device nodes.  Udev will typically create the device nodes
directly under /dev (eg: /dev/hidraw0).  As this location is distribution-
and udev rule-dependent, applications should use libudev to locate hidraw
devices attached to the system.  There is a tutorial on libudev with a
working example at::

	http://www.signal11.us/oss/udev/
	https://web.archive.org/web/2019*/www.signal11.us

The HIDRAW API
---------------

read()
-------
read() will read a queued report received from the HID device. On USB
devices, the reports read using read() are the reports sent from the device
on the INTERRUPT IN endpoint.  By default, read() will block until there is
a report available to be read.  read() can be made non-blocking, by passing
the O_NONBLOCK flag to open(), or by setting the O_NONBLOCK flag using
fcntl().

On a device which uses numbered reports, the first byte of the returned data
will be the report number; the report data follows, beginning in the second
byte.  For devices which do not use numbered reports, the report data
will begin at the first byte.

write()
-------
The write() function will write a report to the device. For USB devices, if
the device has an INTERRUPT OUT endpoint, the report will be sent on that
endpoint. If it does not, the report will be sent over the control endpoint,
using a SET_REPORT transfer.

The first byte of the buffer passed to write() should be set to the report
number.  If the device does not use numbered reports, the first byte should
be set to 0. The report data itself should begin at the second byte.

ioctl()
-------
Hidraw supports the following ioctls:

HIDIOCGRDESCSIZE:
	Get Report Descriptor Size

This ioctl will get the size of the device's report descriptor.

HIDIOCGRDESC:
	Get Report Descriptor

This ioctl returns the device's report descriptor using a
hidraw_report_descriptor struct.  Make sure to set the size field of the
hidraw_report_descriptor struct to the size returned from HIDIOCGRDESCSIZE.

HIDIOCGRAWINFO:
	Get Raw Info

This ioctl will return a hidraw_devinfo struct containing the bus type, the
vendor ID (VID), and product ID (PID) of the device. The bus type can be one
of::

	- BUS_USB
	- BUS_HIL
	- BUS_BLUETOOTH
	- BUS_VIRTUAL

which are defined in uapi/linux/input.h.

HIDIOCGRAWNAME(len):
	Get Raw Name

This ioctl returns a string containing the vendor and product strings of
the device.  The returned string is Unicode, UTF-8 encoded.

HIDIOCGRAWPHYS(len):
	Get Physical Address

This ioctl returns a string representing the physical address of the device.
For USB devices, the string contains the physical path to the device (the
USB controller, hubs, ports, etc).  For Bluetooth devices, the string
contains the hardware (MAC) address of the device.

HIDIOCSFEATURE(len):
	Send a Feature Report

This ioctl will send a feature report to the device.  Per the HID
specification, feature reports are always sent using the control endpoint.
Set the first byte of the supplied buffer to the report number.  For devices
which do not use numbered reports, set the first byte to 0. The report data
begins in the second byte. Make sure to set len accordingly, to one more
than the length of the report (to account for the report number).

HIDIOCGFEATURE(len):
	Get a Feature Report

This ioctl will request a feature report from the device using the control
endpoint.  The first byte of the supplied buffer should be set to the report
number of the requested report.  For devices which do not use numbered
reports, set the first byte to 0.  The returned report buffer will contain the
report number in the first byte, followed by the report data read from the
device.  For devices which do not use numbered reports, the report data will
begin at the first byte of the returned buffer.

HIDIOCSINPUT(len):
	Send an Input Report

This ioctl will send an input report to the device, using the control endpoint.
In most cases, setting an input HID report on a device is meaningless and has
no effect, but some devices may choose to use this to set or reset an initial
state of a report.  The format of the buffer issued with this report is identical
to that of HIDIOCSFEATURE.

HIDIOCGINPUT(len):
	Get an Input Report

This ioctl will request an input report from the device using the control
endpoint.  This is slower on most devices where a dedicated In endpoint exists
for regular input reports, but allows the host to request the value of a
specific report number.  Typically, this is used to request the initial states of
an input report of a device, before an application listens for normal reports via
the regular device read() interface.  The format of the buffer issued with this report
is identical to that of HIDIOCGFEATURE.

HIDIOCSOUTPUT(len):
	Send an Output Report

This ioctl will send an output report to the device, using the control endpoint.
This is slower on most devices where a dedicated Out endpoint exists for regular
output reports, but is added for completeness.  Typically, this is used to set
the initial states of an output report of a device, before an application sends
updates via the regular device write() interface. The format of the buffer issued
with this report is identical to that of HIDIOCSFEATURE.

HIDIOCGOUTPUT(len):
	Get an Output Report

This ioctl will request an output report from the device using the control
endpoint.  Typically, this is used to retrieve the initial state of
an output report of a device, before an application updates it as necessary either
via a HIDIOCSOUTPUT request, or the regular device write() interface.  The format
of the buffer issued with this report is identical to that of HIDIOCGFEATURE.

Example
-------
In samples/, find hid-example.c, which shows examples of read(), write(),
and all the ioctls for hidraw.  The code may be used by anyone for any
purpose, and can serve as a starting point for developing applications using
hidraw.

Document by:

	Alan Ott <alan@signal11.us>, Signal 11 Software
back to top