Revision 9b4fe5fb0bdd8f31f24cbfe77e38ec8155c250c5 authored by Neil Horman on 12 July 2013, 17:35:33 UTC, committed by David S. Miller on 12 July 2013, 23:28:02 UTC
this bug: https://bugzilla.redhat.com/show_bug.cgi?id=951695 Reported a dma debug backtrace: WARNING: at lib/dma-debug.c:937 check_unmap+0x47d/0x930() Hardware name: To Be Filled By O.E.M. via-rhine 0000:00:12.0: DMA-API: device driver failed to check map error[device address=0x0000000075a837b2] [size=90 bytes] [mapped as single] Modules linked in: ip6_tables gspca_spca561 gspca_main videodev media snd_hda_codec_realtek snd_hda_intel i2c_viapro snd_hda_codec snd_hwdep snd_seq ppdev mperf via_rhine coretemp snd_pcm mii microcode snd_page_alloc snd_timer snd_mpu401 snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore parport_pc parport shpchp ata_generic pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm pata_via sata_via i2c_core uinput Pid: 295, comm: systemd-journal Not tainted 3.9.0-0.rc6.git2.1.fc20.x86_64 #1 Call Trace: <IRQ> [<ffffffff81068dd0>] warn_slowpath_common+0x70/0xa0 [<ffffffff81068e4c>] warn_slowpath_fmt+0x4c/0x50 [<ffffffff8137ec6d>] check_unmap+0x47d/0x930 [<ffffffff810ace9f>] ? local_clock+0x5f/0x70 [<ffffffff8137f17f>] debug_dma_unmap_page+0x5f/0x70 [<ffffffffa0225edc>] ? rhine_ack_events.isra.14+0x3c/0x50 [via_rhine] [<ffffffffa02275f8>] rhine_napipoll+0x1d8/0xd80 [via_rhine] [<ffffffff815d3d51>] ? net_rx_action+0xa1/0x380 [<ffffffff815d3e22>] net_rx_action+0x172/0x380 [<ffffffff8107345f>] __do_softirq+0xff/0x400 [<ffffffff81073925>] irq_exit+0xb5/0xc0 [<ffffffff81724cd6>] do_IRQ+0x56/0xc0 [<ffffffff81719ff2>] common_interrupt+0x72/0x72 <EOI> [<ffffffff8170ff57>] ? __slab_alloc+0x4c2/0x526 [<ffffffff811992e0>] ? mmap_region+0x2b0/0x5a0 [<ffffffff810d5807>] ? __lock_is_held+0x57/0x80 [<ffffffff811992e0>] ? mmap_region+0x2b0/0x5a0 [<ffffffff811bf1bf>] kmem_cache_alloc+0x2df/0x360 [<ffffffff811992e0>] mmap_region+0x2b0/0x5a0 [<ffffffff811998e6>] do_mmap_pgoff+0x316/0x3d0 [<ffffffff81183ca0>] vm_mmap_pgoff+0x90/0xc0 [<ffffffff81197d6c>] sys_mmap_pgoff+0x4c/0x190 [<ffffffff81367d7e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8101eb42>] sys_mmap+0x22/0x30 [<ffffffff81722fd9>] system_call_fastpath+0x16/0x1b Usual problem with the usual fix, add the appropriate calls to dma_mapping_error where appropriate Untested, as I don't have hardware, but its pretty straightforward Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: David S. Miller <davem@davemloft.net> CC: Roger Luethi <rl@hellgate.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
1 parent 352900b
mutex-design.txt
Generic Mutex Subsystem
started by Ingo Molnar <mingo@redhat.com>
"Why on earth do we need a new mutex subsystem, and what's wrong
with semaphores?"
firstly, there's nothing wrong with semaphores. But if the simpler
mutex semantics are sufficient for your code, then there are a couple
of advantages of mutexes:
- 'struct mutex' is smaller on most architectures: E.g. on x86,
'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes.
A smaller structure size means less RAM footprint, and better
CPU-cache utilization.
- tighter code. On x86 i get the following .text sizes when
switching all mutex-alike semaphores in the kernel to the mutex
subsystem:
text data bss dec hex filename
3280380 868188 396860 4545428 455b94 vmlinux-semaphore
3255329 865296 396732 4517357 44eded vmlinux-mutex
that's 25051 bytes of code saved, or a 0.76% win - off the hottest
codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%)
Smaller code means better icache footprint, which is one of the
major optimization goals in the Linux kernel currently.
- the mutex subsystem is slightly faster and has better scalability for
contended workloads. On an 8-way x86 system, running a mutex-based
kernel and testing creat+unlink+close (of separate, per-task files)
in /tmp with 16 parallel tasks, the average number of ops/sec is:
Semaphores: Mutexes:
$ ./test-mutex V 16 10 $ ./test-mutex V 16 10
8 CPUs, running 16 tasks. 8 CPUs, running 16 tasks.
checking VFS performance. checking VFS performance.
avg loops/sec: 34713 avg loops/sec: 84153
CPU utilization: 63% CPU utilization: 22%
i.e. in this workload, the mutex based kernel was 2.4 times faster
than the semaphore based kernel, _and_ it also had 2.8 times less CPU
utilization. (In terms of 'ops per CPU cycle', the semaphore kernel
performed 551 ops/sec per 1% of CPU time used, while the mutex kernel
performed 3825 ops/sec per 1% of CPU time used - it was 6.9 times
more efficient.)
the scalability difference is visible even on a 2-way P4 HT box:
Semaphores: Mutexes:
$ ./test-mutex V 16 10 $ ./test-mutex V 16 10
4 CPUs, running 16 tasks. 8 CPUs, running 16 tasks.
checking VFS performance. checking VFS performance.
avg loops/sec: 127659 avg loops/sec: 181082
CPU utilization: 100% CPU utilization: 34%
(the straight performance advantage of mutexes is 41%, the per-cycle
efficiency of mutexes is 4.1 times better.)
- there are no fastpath tradeoffs, the mutex fastpath is just as tight
as the semaphore fastpath. On x86, the locking fastpath is 2
instructions:
c0377ccb <mutex_lock>:
c0377ccb: f0 ff 08 lock decl (%eax)
c0377cce: 78 0e js c0377cde <.text..lock.mutex>
c0377cd0: c3 ret
the unlocking fastpath is equally tight:
c0377cd1 <mutex_unlock>:
c0377cd1: f0 ff 00 lock incl (%eax)
c0377cd4: 7e 0f jle c0377ce5 <.text..lock.mutex+0x7>
c0377cd6: c3 ret
- 'struct mutex' semantics are well-defined and are enforced if
CONFIG_DEBUG_MUTEXES is turned on. Semaphores on the other hand have
virtually no debugging code or instrumentation. The mutex subsystem
checks and enforces the following rules:
* - only one task can hold the mutex at a time
* - only the owner can unlock the mutex
* - multiple unlocks are not permitted
* - recursive locking is not permitted
* - a mutex object must be initialized via the API
* - a mutex object must not be initialized via memset or copying
* - task may not exit with mutex held
* - memory areas where held locks reside must not be freed
* - held mutexes must not be reinitialized
* - mutexes may not be used in hardware or software interrupt
* contexts such as tasklets and timers
furthermore, there are also convenience features in the debugging
code:
* - uses symbolic names of mutexes, whenever they are printed in debug output
* - point-of-acquire tracking, symbolic lookup of function names
* - list of all locks held in the system, printout of them
* - owner tracking
* - detects self-recursing locks and prints out all relevant info
* - detects multi-task circular deadlocks and prints out all affected
* locks and tasks (and only those tasks)
Disadvantages
-------------
The stricter mutex API means you cannot use mutexes the same way you
can use semaphores: e.g. they cannot be used from an interrupt context,
nor can they be unlocked from a different context that which acquired
it. [ I'm not aware of any other (e.g. performance) disadvantages from
using mutexes at the moment, please let me know if you find any. ]
Implementation of mutexes
-------------------------
'struct mutex' is the new mutex type, defined in include/linux/mutex.h
and implemented in kernel/mutex.c. It is a counter-based mutex with a
spinlock and a wait-list. The counter has 3 states: 1 for "unlocked",
0 for "locked" and negative numbers (usually -1) for "locked, potential
waiters queued".
the APIs of 'struct mutex' have been streamlined:
DEFINE_MUTEX(name);
mutex_init(mutex);
void mutex_lock(struct mutex *lock);
int mutex_lock_interruptible(struct mutex *lock);
int mutex_trylock(struct mutex *lock);
void mutex_unlock(struct mutex *lock);
int mutex_is_locked(struct mutex *lock);
void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
int mutex_lock_interruptible_nested(struct mutex *lock,
unsigned int subclass);
int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
![swh spinner](/static/img/swh-spinner.gif)
Computing file changes ...