Don't sample the first entry outside the timer as this is a different
code path which produces a different offset from the clock tick.
Use sys_clock_hw_cycles_per_sec() to be compatible with systems that
read their hardware clock frequency at run time.
Perform cycle difference computations with uint64_t. If ever the
magnitude of the absolute clock cycle values is greater than 52 bits
then the cast to a double will actually lose accuracy.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>