Pull timer updates from Thomas Gleixner:
"Rather large, but nothing exiting:
- new range check for settimeofday() to prevent that boot time
becomes negative.
- fix for file time rounding
- a few simplifications of the hrtimer code
- fix for the proc/timerlist code so the output of clock realtime
timers is accurate
- more y2038 work
- tree wide conversion of clockevent drivers to the new callbacks"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (88 commits)
hrtimer: Handle failure of tick_init_highres() gracefully
hrtimer: Unconfuse switch_hrtimer_base() a bit
hrtimer: Simplify get_target_base() by returning current base
hrtimer: Drop return code of hrtimer_switch_to_hres()
time: Introduce timespec64_to_jiffies()/jiffies_to_timespec64()
time: Introduce current_kernel_time64()
time: Introduce struct itimerspec64
time: Add the common weak version of update_persistent_clock()
time: Always make sure wall_to_monotonic isn't positive
time: Fix nanosecond file time rounding in timespec_trunc()
timer_list: Add the base offset so remaining nsecs are accurate for non monotonic timers
cris/time: Migrate to new 'set-state' interface
kernel: broadcast-hrtimer: Migrate to new 'set-state' interface
xtensa/time: Migrate to new 'set-state' interface
unicore/time: Migrate to new 'set-state' interface
um/time: Migrate to new 'set-state' interface
sparc/time: Migrate to new 'set-state' interface
sh/localtimer: Migrate to new 'set-state' interface
score/time: Migrate to new 'set-state' interface
s390/time: Migrate to new 'set-state' interface
...
Pull NOHZ updates from Ingo Molnar:
"The main changes, mostly written by Frederic Weisbecker, include:
- Fix some jiffies based cputime assumptions. (No real harm because
the concerned code isn't used by full dynticks.)
- Simplify jiffies <-> usecs conversions. Remove dead code.
- Remove early hacks on nohz full code that avoided messing up idle
nohz internals. Now nohz integrates well full and idle and such
hack have become needless.
- Restart nohz full tick from irq exit. (A simplification and a
preparation for future optimization on scheduler kick to nohz
full)
- Code cleanups.
- Tile driver isolation enhancement on top of nohz. (Chris Metcalf)"
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
nohz: Remove useless argument on tick_nohz_task_switch()
nohz: Move tick_nohz_restart_sched_tick() above its users
nohz: Restart nohz full tick from irq exit
nohz: Remove idle task special case
nohz: Prevent tilegx network driver interrupts
alpha: Fix jiffies based cputime assumption
apm32: Fix cputime == jiffies assumption
jiffies: Remove HZ > USEC_PER_SEC special case
Pull RCU updates from Ingo Molnar:
"The main RCU changes in this cycle are:
- the combination of tree geometry-initialization simplifications and
OS-jitter-reduction changes to expedited grace periods. These two
are stacked due to the large number of conflicts that would
otherwise result.
- privatize smp_mb__after_unlock_lock().
This commit moves the definition of smp_mb__after_unlock_lock() to
kernel/rcu/tree.h, in recognition of the fact that RCU is the only
thing using this, that nothing else is likely to use it, and that
it is likely to go away completely.
- documentation updates.
- torture-test updates.
- misc fixes"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
rcu,locking: Privatize smp_mb__after_unlock_lock()
rcu: Silence lockdep false positive for expedited grace periods
rcu: Don't disable CPU hotplug during OOM notifiers
scripts: Make checkpatch.pl warn on expedited RCU grace periods
rcu: Update MAINTAINERS entry
rcu: Clarify CONFIG_RCU_EQS_DEBUG help text
rcu: Fix backwards RCU_LOCKDEP_WARN() in synchronize_rcu_tasks()
rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()
rcu: Make rcu_is_watching() really notrace
cpu: Wait for RCU grace periods concurrently
rcu: Create a synchronize_rcu_mult()
rcu: Fix obsolete priority-boosting comment
rcu: Use WRITE_ONCE in RCU_INIT_POINTER
rcu: Hide RCU_NOCB_CPU behind RCU_EXPERT
rcu: Add RCU-sched flavors of get-state and cond-sync
rcu: Add fastpath bypassing funnel locking
rcu: Rename RCU_GP_DONE_FQS to RCU_GP_DOING_FQS
rcu: Pull out wait_event*() condition into helper function
documentation: Describe new expedited stall warnings
rcu: Add stall warnings to synchronize_sched_expedited()
...
Commit 75e3b37d05 ("hrtimer: Drop return code of hrtimer_switch_to_hres()")
drops the return code of hrtimer_switch_to_hres(). While doing so, it also
drops the return statement itself on failure. This may cause a system hang.
Seen when running arm:multi_v7_defconfig in qemu with devicetree file
vexpress-v2p-ca9.
Fixes: 75e3b37d05 ("hrtimer: Drop return code of hrtimer_switch_to_hres()")
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/1440231047-16256-1-git-send-email-linux@roeck-us.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The variable called "this_base" is confusing because its name suggests
it's of "struct hrtimer_clock_base" type, along with "base" and "new_base"
which doesn't help understanding this complicated function.
Make its name clearer and fix the misleading comment while at it.
[ tglx: Fixed the comment for real ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1439907509-9553-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The conversion between struct timespec and jiffies is not year 2038
safe on 32bit systems. Introduce timespec64_to_jiffies() and
jiffies_to_timespec64() functions which use struct timespec64 to
make it ready for 2038 issue.
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
The weak update_persistent_clock64() calls update_persistent_clock(),
if the architecture defines an update_persistent_clock64() to replace
and remove its update_persistent_clock() version, when building the
kernel the linker will throw an undefined symbol error, that is, any
arch that switches to update_persistent_clock64() will have this issue.
To solve the issue, we add the common weak update_persistent_clock().
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Two issues were found on an IMX6 development board without an
enabled RTC device(resulting in the boot time and monotonic
time being initialized to 0).
Issue 1:exportfs -a generate:
"exportfs: /opt/nfs/arm does not support NFS export"
Issue 2:cat /proc/stat:
"btime 4294967236"
The same issues can be reproduced on x86 after running the
following code:
int main(void)
{
struct timeval val;
int ret;
val.tv_sec = 0;
val.tv_usec = 0;
ret = settimeofday(&val, NULL);
return 0;
}
Two issues are different symptoms of same problem:
The reason is a positive wall_to_monotonic pushes boot time back
to the time before Epoch, and getboottime will return negative
value.
In symptom 1:
negative boot time cause get_expiry() to overflow time_t
when input expire time is 2147483647, then cache_flush()
always clears entries just added in ip_map_parse.
In symptom 2:
show_stat() uses "unsigned long" to print negative btime
value returned by getboottime.
This patch fix the problem by prohibiting time from being set to a value which
would cause a negative boot time. As a result one can't set the CLOCK_REALTIME
time prior to (1970 + system uptime).
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Wang YanQing <udknight@gmail.com>
[jstultz: reworded commit message]
Signed-off-by: John Stultz <john.stultz@linaro.org>
timespec_trunc() avoids rounding if granularity <= nanoseconds-per-jiffie
(or TICK_NSEC). This optimization assumes that:
1. current_kernel_time().tv_nsec is already rounded to TICK_NSEC (i.e.
with HZ=1000 you'd get 1000000, 2000000, 3000000... but never 1000001).
This is no longer true (probably since hrtimers introduced in 2.6.16).
2. TICK_NSEC is evenly divisible by all possible granularities. This may
be true for HZ=100, 250, 1000, but obviously not for HZ=300 /
TICK_NSEC=3333333 (introduced in 2.6.20).
Thus, sub-second portions of in-core file times are not rounded to on-disk
granularity. I.e. file times may change when the inode is re-read from disk
or when the file system is remounted.
This affects all file systems with file time granularities > 1 ns and < 1s,
e.g. CEPH (1000 ns), UDF (1000 ns), CIFS (100 ns), NTFS (100 ns) and FUSE
(configurable from user mode via struct fuse_init_out.time_gran).
Steps to reproduce with e.g. UDF:
$ dd if=/dev/zero of=udfdisk count=10000 && mkudffs udfdisk
$ mkdir udf && mount udfdisk udf
$ touch udf/test && stat -c %y udf/test
2015-06-09 10:22:56.130006767 +0200
$ umount udf && mount udfdisk udf
$ stat -c %y udf/test
2015-06-09 10:22:56.130006000 +0200
Remounting truncates the mtime to 1 µs.
Fix the rounding in timespec_trunc() and update the documentation.
timespec_trunc() is exclusively used to calculate inode's [acm]time (mostly
via current_fs_time()), and always with super_block.s_time_gran as second
argument. So this can safely be changed without side effects.
Note: This does _not_ fix the issue for FAT's 2 second mtime resolution,
as super_block.s_time_gran isn't prepared to handle different ctime /
mtime / atime resolutions nor resolutions > 1 second.
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
I noticed for non-monotonic timers in timer_list, some of the
output looked a little confusing.
For example:
#1: <0000000000000000>, posix_timer_fn, S:01, hrtimer_start_range_ns, leap-a-day/2360
# expires at 1434412800000000000-1434412800000000000 nsecs [in 1434410725062375469 to 1434410725062375469 nsecs]
You'll note the relative time till the expiration "[in xxx to
yyy nsecs]" is incorrect. This is because its printing the delta
between CLOCK_MONOTONIC time to the CLOCK_REALTIME expiration.
This patch fixes this issue by adding the clock offset to the
"now" time which we use to calculate the delta.
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Pull RCU changes from Paul E. McKenney:
- The combination of tree geometry-initialization simplifications
and OS-jitter-reduction changes to expedited grace periods.
These two are stacked due to the large number of conflicts
that would otherwise result.
[ With one addition, a temporary commit to silence a lockdep false
positive. Additional changes to the expedited grace-period
primitives (queued for 4.4) remove the cause of this false
positive, and therefore include a revert of this temporary commit. ]
- Documentation updates.
- Torture-test updates.
- Miscellaneous fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Migrate broadcast-hrtimer driver to the new 'set-state' interface
provided by clockevents core, the earlier 'set-mode' interface is marked
obsolete now.
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Restart the tick when necessary from the irq exit path. It makes nohz
full more flexible, simplify the related IPIs and doesn't bring
significant overhead on irq exit.
In a longer term view, it will allow us to piggyback the nohz kick
on the scheduler IPI in the future instead of sending a dedicated IPI
that often doubles the scheduler IPI on task wakeup. This will require
more changes though including careful review of resched_curr() callers
to include nohz full needs.
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
On nohz full early days, idle dynticks and full dynticks weren't well
integrated and we couldn't risk full dynticks calls on idle without
risking messing up tick idle statistics. This is why we prevented such
thing to happen.
Nowadays full dynticks and idle dynticks are better integrated and
interact without known issue.
So lets remove that.
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
tick_broadcast_oneshot_control got moved from tick-broadcast to
tick-common, but the export stayed in the old place. Fix it up.
Fixes: f32dd11705 'tick/broadcast: Make idle check independent from mode and config'
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Dan reported that the recent changes to the broadcast code introduced
a potential NULL dereference.
Add the proper check.
Fixes: e045431190 "tick/broadcast: Sanity check the shutdown of the local clock_event"
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>