The loop over a volatile write was choosen because it got me close to
the single-threaded sysbench numbers, but it's limited by the CPU and
not by the memory bandwidth; so replace it with memset().
This helps in cross-compilation where the flags passed from environment
will matter much e.g. ABI, architecture etc.
Signed-off-by: Khem Raj <raj.khem@gmail.com>
The memcpy() means that we do both read and write operations. The writes
are slower and they disturb the caches, so drop them to get cleaner
measurements.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>