The cpuacct split caused this build failure on UML:
kernel/sched/cpuacct.c:94:2: error: implicit declaration of function 'ERR_PTR'
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull CPU runtime stats/accounting fixes from Frederic Weisbecker:
" Some users are complaining that their threadgroup's runtime accounting
freezes after a week or so of intense cpu-bound workload. This set tries
to fix the issue by reducing the risk of multiplication overflow in the
cputime scaling code. "
Stanislaw Gruszka further explained the historic context and impact of the
bug:
" Commit 0cf55e1ec0 start to use scalling
for whole thread group, so increase chances of hitting multiplication
overflow, depending on how many CPUs are on the system.
We have multiplication utime * rtime for one thread since commit
b27f03d4bd.
Overflow will happen after:
rtime * utime > 0xffffffffffffffff jiffies
if thread utilize 100% of CPU time, that gives:
rtime > sqrt(0xffffffffffffffff) jiffies
ritme > sqrt(0xffffffffffffffff) / (24 * 60 * 60 * HZ) days
For HZ 100 it will be 497 days for HZ 1000 it will be 49 days.
Bug affect only users, who run CPU intensive application for that
long period. Also they have to be interested on utime,stime values,
as bug has no other visible effect as making those values incorrect. "
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some users have reported that after running a process with
hundreds of threads on intensive CPU-bound loads, the cputime
of the group started to freeze after a few days.
This is due to how we scale the tick-based cputime against
the scheduler precise execution time value.
We add the values of all threads in the group and we multiply
that against the sum of the scheduler exec runtime of the whole
group.
This easily overflows after a few days/weeks of execution.
A proposed solution to solve this was to compute that multiplication
on stime instead of utime:
62188451f0
("cputime: Avoid multiplication overflow on utime scaling")
The rationale behind that was that it's easy for a thread to
spend most of its time in userspace under intensive CPU-bound workload
but it's much harder to do CPU-bound intensive long run in the kernel.
This postulate got defeated when a user recently reported he was still
seeing cputime freezes after the above patch. The workload that
triggers this issue relates to intensive networking workloads where
most of the cputime is consumed in the kernel.
To reduce much more the opportunities for multiplication overflow,
lets reduce the multiplication factors to the remainders of the division
between sched exec runtime and cputime. Assuming the difference between
these shouldn't ever be that large, it could work on many situations.
This gets the same results as in the upstream scaling code except for
a small difference: the upstream code always rounds the results to
the nearest integer not greater to what would be the precise result.
The new code rounds to the nearest integer either greater or not
greater. In practice this difference probably shouldn't matter but
it's worth mentioning.
If this solution appears not to be enough in the end, we'll
need to partly revert back to the behaviour prior to commit
0cf55e1ec0
("sched, cputime: Introduce thread_group_times()")
Back then, the scaling was done on exit() time before adding the cputime
of an exiting thread to the signal struct. And then we'll need to
scale one-by-one the live threads cputime in thread_group_cputime(). The
drawback may be a slightly slower code on exit time.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
All warnings:
In file included from kernel/sched/core.c:85:0:
kernel/sched/sched.h:1036:39: warning: 'struct sched_domain' declared inside parameter list
kernel/sched/sched.h:1036:39: warning: its scope is only this definition or declaration, which is probably not what you want
It's because struct sched_domain is defined inside #if CONFIG_SMP,
while update_group_power() is declared unconditionally.
Fix this warning by declaring update_group_power() only if
CONFIG_SMP=n.
Build tested with CONFIG_SMP enabled and then disabled.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5137F4BA.2060101@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>