https://github.com/torvalds/linux
Revision e658a6f14d7c0243205f035979d0ecf6c12a036f authored by Chris Metcalf on 16 November 2016, 16:18:05 UTC, committed by Chris Metcalf on 23 November 2016, 20:28:54 UTC
For large values of "mult" and long uptimes, the intermediate result of "cycles * mult" can overflow 64 bits. For example, the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock; we have mult = 853, and after 208.5 days, we overflow 64 bits. Since clocksource_cyc2ns() is intended to be used for relative cycle counts, not absolute cycle counts, performance is more importance than accepting a wider range of cycle values. So, just use mult_frac() directly in tile's sched_clock(). Commit 4cecf6d401a0 ("sched, x86: Avoid unnecessary overflow in sched_clock") by Salman Qazi results in essentially the same generated code for x86 as this change does for tile. In fact, a follow-on change by Salman introduced mult_frac() and switched to using it, so the C code was largely identical at that point too. Peter Zijlstra then added mul_u64_u32_shr() and switched x86 to use it. This is, in principle, better; by optimizing the 64x64->64 multiplies to be 32x32->64 multiplies we can potentially save some time. However, the compiler piplines the 64x64->64 multiplies pretty well, and the conditional branch in the generic mul_u64_u32_shr() causes some bubbles in execution, with the result that it's pretty much a wash. If tilegx provided its own implementation of mul_u64_u32_shr() without the conditional branch, we could potentially save 3 cycles, but that seems like small gain for a fair amount of additional build scaffolding; no other platform currently provides a mul_u64_u32_shr() override, and tile doesn't currently have an <asm/div64.h> header to put the override in. Additionally, gcc currently has an optimization bug that prevents it from recognizing the opportunity to use a 32x32->64 multiply, and so the result would be no better than the existing mult_frac() until such time as the compiler is fixed. For now, just using mult_frac() seems like the right answer. Cc: stable@kernel.org [v3.4+] Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
1 parent ded9b5d
Tip revision: e658a6f14d7c0243205f035979d0ecf6c12a036f authored by Chris Metcalf on 16 November 2016, 16:18:05 UTC
tile: avoid using clocksource_cyc2ns with absolute cycle count
tile: avoid using clocksource_cyc2ns with absolute cycle count
Tip revision: e658a6f
File | Mode | Size |
---|---|---|
Documentation | ||
arch | ||
block | ||
certs | ||
crypto | ||
drivers | ||
firmware | ||
fs | ||
include | ||
init | ||
ipc | ||
kernel | ||
lib | ||
mm | ||
net | ||
samples | ||
scripts | ||
security | ||
sound | ||
tools | ||
usr | ||
virt | ||
.cocciconfig | -rw-r--r-- | 59 bytes |
.get_maintainer.ignore | -rw-r--r-- | 31 bytes |
.gitattributes | -rw-r--r-- | 30 bytes |
.gitignore | -rw-r--r-- | 1.3 KB |
.mailmap | -rw-r--r-- | 7.5 KB |
COPYING | -rw-r--r-- | 18.3 KB |
CREDITS | -rw-r--r-- | 96.0 KB |
Kbuild | -rw-r--r-- | 2.8 KB |
Kconfig | -rw-r--r-- | 252 bytes |
MAINTAINERS | -rw-r--r-- | 373.5 KB |
Makefile | -rw-r--r-- | 57.5 KB |
README | -rw-r--r-- | 17.9 KB |
REPORTING-BUGS | -rw-r--r-- | 7.3 KB |
![swh spinner](/static/img/swh-spinner.gif)
Computing file changes ...