Revision a8950e49bd241dc9ad90c24e92e23e95dbe9f018 authored by Romain Perier on 19 April 2016, 10:17:32 UTC, committed by Ley Foon Tan on 27 April 2016, 08:35:55 UTC
Depending on the size of the area to be memset'ed, the nios2 memset implementation
either uses a naive loop (for buffers smaller or equal than 8 bytes) or a more optimized
implementation (for buffers larger than 8 bytes). This implementation does 4-byte stores
rather than 1-byte stores to speed up memset.

However, we discovered that on our nios2 platform, memset() was not properly setting the
buffer to the expected value. A memset of 0xff would not set the entire buffer to 0xff, but to:

0xff 0x00 0xff 0x00 0xff 0x00 0xff 0x00 ...

Which is obviously incorrect. Our investigation has revealed that the problem lies in the
incorrect constraints used in the inline assembly.

The following piece of assembly, from the nios2 memset implementation, is supposed to
create a 4-byte value that repeats 4 times the 1-byte pattern passed as memset argument:

/* fill8 %3, %5 (c & 0xff) */
"       slli    %4, %5, 8\n"
"       or      %4, %4, %5\n"
"       slli    %3, %4, 16\n"
"       or      %3, %3, %4\n"

However, depending on the compiler and optimization level, this code might be compiled as:

34:	280a923a 	slli	r5,r5,8
38:	294ab03a 	or	r5,r5,r5
3c:	2808943a 	slli	r4,r5,16
40:	2148b03a 	or	r4,r4,r5

This is wrong because r5 gets used both for %5 and %4, which leads to the final pattern
stored in r4 to be 0xff00ff00 rather than the expected 0xffffffff.

%4 is defined with the "=r" constraint, i.e as an output operand. However, as explained in
http://www.ethernut.de/en/documents/arm-inline-asm.html, this does not prevent gcc from
using the same register for an output operand (%4) and input operand (%5). By using the
constraint modifier '&', we indicate that the register should be used for output only. With this
change, we get the following assembly output:

34:	2810923a 	slli	r8,r5,8
38:	4150b03a 	or	r8,r8,r5
3c:	400e943a 	slli	r7,r8,16
40:	3a0eb03a 	or	r7,r7,r8

Which correctly produces the 0xffffffff pattern when 0xff is passed as the memset() pattern.

It is worth mentioning the observed consequence of this bug: we were hitting the kernel
BUG() in mm/bootmem.c:__free() that verifies when marking a page as free that it was
previously marked as occupied (i.e that the bit was set to 1). The entire bootmem bitmap is
set to 0xff bit via a memset() during the bootmem initialization. The bootmem_free() call right
after the initialization was finding some bits to be set to 0, which didn't make sense since the
bitmap has just been memset'ed to 0xff. Except that due to the bug explained above, the
bitmap was in fact initialized to 0xff00ff00.

Thanks to Marek Vasut for his help and feedback.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Marek Vasut <marex@denx.de>
Acked-by: Ley Foon Tan <lftan@altera.com>
1 parent 02da2d7
History
File Mode Size
842
fonts
lz4
lzo
mpi
raid6
reed_solomon
xz
zlib_deflate
zlib_inflate
.gitignore -rw-r--r-- 70 bytes
Kconfig -rw-r--r-- 11.9 KB
Kconfig.debug -rw-r--r-- 67.6 KB
Kconfig.kasan -rw-r--r-- 1.8 KB
Kconfig.kgdb -rw-r--r-- 4.1 KB
Kconfig.kmemcheck -rw-r--r-- 2.9 KB
Kconfig.ubsan -rw-r--r-- 1.3 KB
Makefile -rw-r--r-- 7.4 KB
argv_split.c -rw-r--r-- 2.1 KB
asn1_decoder.c -rw-r--r-- 13.1 KB
assoc_array.c -rw-r--r-- 52.6 KB
atomic64.c -rw-r--r-- 4.1 KB
atomic64_test.c -rw-r--r-- 5.4 KB
audit.c -rw-r--r-- 1.7 KB
bcd.c -rw-r--r-- 261 bytes
bch.c -rw-r--r-- 35.6 KB
bitmap.c -rw-r--r-- 35.3 KB
bitrev.c -rw-r--r-- 1.9 KB
bsearch.c -rw-r--r-- 1.6 KB
btree.c -rw-r--r-- 19.2 KB
bug.c -rw-r--r-- 4.7 KB
build_OID_registry -rwxr-xr-x 4.7 KB
bust_spinlocks.c -rw-r--r-- 660 bytes
check_signature.c -rw-r--r-- 599 bytes
checksum.c -rw-r--r-- 5.0 KB
clz_ctz.c -rw-r--r-- 1.3 KB
clz_tab.c -rw-r--r-- 855 bytes
cmdline.c -rw-r--r-- 4.1 KB
compat_audit.c -rw-r--r-- 796 bytes
cordic.c -rw-r--r-- 2.5 KB
cpu-notifier-error-inject.c -rw-r--r-- 1.2 KB
cpu_rmap.c -rw-r--r-- 7.8 KB
cpumask.c -rw-r--r-- 4.6 KB
crc-ccitt.c -rw-r--r-- 3.0 KB
crc-itu-t.c -rw-r--r-- 2.8 KB
crc-t10dif.c -rw-r--r-- 1.6 KB
crc16.c -rw-r--r-- 2.8 KB
crc32.c -rw-r--r-- 45.4 KB
crc32defs.h -rw-r--r-- 2.0 KB
crc7.c -rw-r--r-- 2.6 KB
crc8.c -rw-r--r-- 2.4 KB
ctype.c -rw-r--r-- 1.4 KB
debug_info.c -rw-r--r-- 741 bytes
debug_locks.c -rw-r--r-- 1.2 KB
debugobjects.c -rw-r--r-- 26.1 KB
dec_and_lock.c -rw-r--r-- 784 bytes
decompress.c -rw-r--r-- 1.7 KB
decompress_bunzip2.c -rw-r--r-- 23.5 KB
decompress_inflate.c -rw-r--r-- 4.4 KB
decompress_unlz4.c -rw-r--r-- 4.2 KB
decompress_unlzma.c -rw-r--r-- 15.8 KB
decompress_unlzo.c -rw-r--r-- 7.1 KB
decompress_unxz.c -rw-r--r-- 10.9 KB
devres.c -rw-r--r-- 9.8 KB
digsig.c -rw-r--r-- 5.5 KB
div64.c -rw-r--r-- 4.1 KB
dma-debug.c -rw-r--r-- 42.2 KB
dma-noop.c -rw-r--r-- 1.7 KB
dump_stack.c -rw-r--r-- 1.2 KB
dynamic_debug.c -rw-r--r-- 25.1 KB
dynamic_queue_limits.c -rw-r--r-- 4.3 KB
earlycpio.c -rw-r--r-- 3.9 KB
extable.c -rw-r--r-- 3.1 KB
fault-inject.c -rw-r--r-- 6.0 KB
fdt.c -rw-r--r-- 69 bytes
fdt_empty_tree.c -rw-r--r-- 80 bytes
fdt_ro.c -rw-r--r-- 72 bytes
fdt_rw.c -rw-r--r-- 72 bytes
fdt_strerror.c -rw-r--r-- 78 bytes
fdt_sw.c -rw-r--r-- 72 bytes
fdt_wip.c -rw-r--r-- 73 bytes
find_bit.c -rw-r--r-- 4.5 KB
flex_array.c -rw-r--r-- 11.0 KB
flex_proportions.c -rw-r--r-- 6.9 KB
gcd.c -rw-r--r-- 313 bytes
gen_crc32table.c -rw-r--r-- 3.2 KB
genalloc.c -rw-r--r-- 21.6 KB
glob.c -rw-r--r-- 7.7 KB
halfmd4.c -rw-r--r-- 2.0 KB
hexdump.c -rw-r--r-- 8.3 KB
hweight.c -rw-r--r-- 1.9 KB
idr.c -rw-r--r-- 28.0 KB
inflate.c -rw-r--r-- 38.6 KB
int_sqrt.c -rw-r--r-- 652 bytes
interval_tree.c -rw-r--r-- 499 bytes
interval_tree_test.c -rw-r--r-- 2.3 KB
iomap.c -rw-r--r-- 6.5 KB
iomap_copy.c -rw-r--r-- 2.8 KB
iommu-common.c -rw-r--r-- 7.0 KB
iommu-helper.c -rw-r--r-- 1.0 KB
ioremap.c -rw-r--r-- 3.4 KB
iov_iter.c -rw-r--r-- 19.9 KB
irq_poll.c -rw-r--r-- 5.6 KB
irq_regs.c -rw-r--r-- 604 bytes
is_single_threaded.c -rw-r--r-- 1.3 KB
jedec_ddr_data.c -rw-r--r-- 3.0 KB
kasprintf.c -rw-r--r-- 1.4 KB
kfifo.c -rw-r--r-- 12.7 KB
klist.c -rw-r--r-- 10.3 KB
kobject.c -rw-r--r-- 26.0 KB
kobject_uevent.c -rw-r--r-- 11.3 KB
kstrtox.c -rw-r--r-- 10.6 KB
kstrtox.h -rw-r--r-- 254 bytes
lcm.c -rw-r--r-- 441 bytes
libcrc32c.c -rw-r--r-- 2.1 KB
list_debug.c -rw-r--r-- 2.6 KB
list_sort.c -rw-r--r-- 6.8 KB
llist.c -rw-r--r-- 3.1 KB
locking-selftest-hardirq.h -rw-r--r-- 207 bytes
locking-selftest-mutex.h -rw-r--r-- 120 bytes
locking-selftest-rlock-hardirq.h -rw-r--r-- 74 bytes
locking-selftest-rlock-softirq.h -rw-r--r-- 74 bytes
locking-selftest-rlock.h -rw-r--r-- 158 bytes
locking-selftest-rsem.h -rw-r--r-- 163 bytes
locking-selftest-softirq.h -rw-r--r-- 207 bytes
locking-selftest-spin-hardirq.h -rw-r--r-- 73 bytes
locking-selftest-spin-softirq.h -rw-r--r-- 73 bytes
locking-selftest-spin.h -rw-r--r-- 118 bytes
locking-selftest-wlock-hardirq.h -rw-r--r-- 74 bytes
locking-selftest-wlock-softirq.h -rw-r--r-- 74 bytes
locking-selftest-wlock.h -rw-r--r-- 158 bytes
locking-selftest-wsem.h -rw-r--r-- 163 bytes
locking-selftest.c -rw-r--r-- 40.1 KB
lockref.c -rw-r--r-- 3.9 KB
lru_cache.c -rw-r--r-- 19.4 KB
md5.c -rw-r--r-- 3.7 KB
memory-notifier-error-inject.c -rw-r--r-- 1.1 KB
memweight.c -rw-r--r-- 999 bytes
net_utils.c -rw-r--r-- 604 bytes
netdev-notifier-error-inject.c -rw-r--r-- 1.5 KB
nlattr.c -rw-r--r-- 12.5 KB
nmi_backtrace.c -rw-r--r-- 4.5 KB
notifier-error-inject.c -rw-r--r-- 2.7 KB
notifier-error-inject.h -rw-r--r-- 614 bytes
of-reconfig-notifier-error-inject.c -rw-r--r-- 1.3 KB
oid_registry.c -rw-r--r-- 3.8 KB
once.c -rw-r--r-- 1.3 KB
parser.c -rw-r--r-- 7.1 KB
pci_iomap.c -rw-r--r-- 4.2 KB
percpu-refcount.c -rw-r--r-- 11.7 KB
percpu_counter.c -rw-r--r-- 5.5 KB
percpu_ida.c -rw-r--r-- 9.5 KB
percpu_test.c -rw-r--r-- 3.2 KB
plist.c -rw-r--r-- 5.9 KB
pm-notifier-error-inject.c -rw-r--r-- 1.1 KB
proportions.c -rw-r--r-- 9.3 KB
radix-tree.c -rw-r--r-- 42.5 KB
random32.c -rw-r--r-- 12.7 KB
ratelimit.c -rw-r--r-- 1.5 KB
rational.c -rw-r--r-- 1.5 KB
rbtree.c -rw-r--r-- 16.2 KB
rbtree_test.c -rw-r--r-- 5.5 KB
reciprocal_div.c -rw-r--r-- 492 bytes
rhashtable.c -rw-r--r-- 20.6 KB
scatterlist.c -rw-r--r-- 18.8 KB
seq_buf.c -rw-r--r-- 7.8 KB
sg_split.c -rw-r--r-- 5.1 KB
sha1.c -rw-r--r-- 6.1 KB
show_mem.c -rw-r--r-- 1.2 KB
smp_processor_id.c -rw-r--r-- 1.3 KB
sort.c -rw-r--r-- 2.9 KB
stackdepot.c -rw-r--r-- 8.3 KB
stmp_device.c -rw-r--r-- 2.1 KB
string.c -rw-r--r-- 20.0 KB
string_helpers.c -rw-r--r-- 11.3 KB
strncpy_from_user.c -rw-r--r-- 3.0 KB
strnlen_user.c -rw-r--r-- 4.3 KB
swiotlb.c -rw-r--r-- 27.3 KB
syscall.c -rw-r--r-- 2.4 KB
test-kstrtox.c -rw-r--r-- 17.3 KB
test-string_helpers.c -rw-r--r-- 10.3 KB
test_bitmap.c -rw-r--r-- 9.0 KB
test_bpf.c -rw-r--r-- 133.6 KB
test_firmware.c -rw-r--r-- 4.3 KB
test_hexdump.c -rw-r--r-- 5.7 KB
test_kasan.c -rw-r--r-- 6.9 KB
test_module.c -rw-r--r-- 753 bytes
test_printf.c -rw-r--r-- 12.4 KB
test_rhashtable.c -rw-r--r-- 10.3 KB
test_static_key_base.c -rw-r--r-- 2.0 KB
test_static_keys.c -rw-r--r-- 6.0 KB
test_user_copy.c -rw-r--r-- 3.1 KB
textsearch.c -rw-r--r-- 9.4 KB
timerqueue.c -rw-r--r-- 3.2 KB
ts_bm.c -rw-r--r-- 5.3 KB
ts_fsm.c -rw-r--r-- 10.6 KB
ts_kmp.c -rw-r--r-- 4.3 KB
ubsan.c -rw-r--r-- 10.9 KB
ubsan.h -rw-r--r-- 1.5 KB
ucs2_string.c -rw-r--r-- 2.4 KB
usercopy.c -rw-r--r-- 197 bytes
uuid.c -rw-r--r-- 1.3 KB
vsprintf.c -rw-r--r-- 66.8 KB

back to top