Commit Graph

16078 Commits

Author SHA1 Message Date
Mark Brown
15fdd2469e Merge remote-tracking branch 'lsk/v3.10/topic/big.LITTLE' into linux-linaro-lsk 2014-05-15 16:55:19 +01:00
Chris Redpath
d1df056f9e hmp: dont attempt to pull tasks if affinity doesn't allow it
When looking for a task to be idle-pulled, don't consider tasks
where the affinity does not allow that task to be placed on the
target CPU. Also ensure that tasks with restricted affinity
do not block selecting other unrestricted busy tasks.

Use the knowledge of target CPU more effectively in idle pull
by passing to hmp_get_heaviest_task when we know it, otherwise
only checking for general affinity matches with any of the CPUs
in the bigger HMP domain.

We still need to explicitly check affinity is allowed in idle pull
since if we find no match in hmp_get_heaviest_task we will return
the current one, which may not be affine to the new CPU despite
having high enough load. In this case, there is nothing to move.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-05-09 17:22:39 +01:00
Chris Redpath
940407d585 hmp: Use idle pull to perform forced up-migrations
When a normal forced up-migration takes place we stop the task to
be migrated while the target CPU becomes available. This delay can
range from 80us to 1500us on TC2 if the target CPU is in a deep idle
state.

Instead, interrupt the target CPU and ask it to pull a task.
This lets the current eligible task continue executing on the
original CPU while the target CPU wakes. Use a pinned timer to
prevent the pulling CPU going back into power-down with pending
up-migrations.

If we trigger for a nohz kick, it doesn't matter about triggering
for an idle pull since the idle_pull flag will be set when we
execute the softirq and we'll still do the idle pull.

If the target CPU is busy, we will not pull any tasks.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-05-09 17:22:39 +01:00
Chris Redpath
0168997c0a sched: hmp: unify active migration code
The HMP active migration code is functionally identical to the CFS
active migration code apart from one flag check. Share the code
and make the flag check optional.

Two wrapper functions allow the flag check to be present or not.

Thanks to tixy@linaro.org for pointing out the build break and a
good solution in an earlier version.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-05-09 17:22:39 +01:00
Chris Redpath
84efcd0cc5 hmp: sched: Clean up hmp_up_threshold checks into a utility fn
In anticipation of modifying the up_threshold handling, make all
instances use the same utility fn to check if a task is eligible
for up-migration. This also removes the previous difference in
threshold comparison where up-migration used '!<threshold' and
idle pull used '>threshold' to decide up-migration eligibility.
Make them both use '!<threshold' instead for consistency, although
this is unlikely to change any results.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-05-09 17:22:39 +01:00
Mark Brown
2cae6ea4bb Merge remote-tracking branch 'lsk/v3.10/topic/big.LITTLE' into linux-linaro-lsk 2014-05-07 14:55:58 +01:00
Chris Redpath
1ade57e54e sched: hmp: Change small task packing defaults for all platforms
All platforms other than TC2 default to enabling packing. Since TC2
shows no performance or energy degradation with this feature enabled
make it default enabled the same as everyone else.
Likewise, vendors have been including TC2 support in multi-machine
kernel builds so they expect the default thresholds to remain the
same when the TC2 #ifdef is removed.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-05-07 11:34:00 +01:00
Mark Brown
03b1200275 Merge tag 'v3.10.39' into linux-linaro-lsk
This is the 3.10.39 stable release
2014-05-07 09:50:01 +01:00
Liu Hua
82c647cbfc hung_task: check the value of "sysctl_hung_task_timeout_sec"
commit 80df284765 upstream.

As sysctl_hung_task_timeout_sec is unsigned long, when this value is
larger then LONG_MAX/HZ, the function schedule_timeout_interruptible in
watchdog will return immediately without sleep and with print :

  schedule_timeout: wrong timeout value ffffffffffffff83

and then the funtion watchdog will call schedule_timeout_interruptible
again and again.  The screen will be filled with

	"schedule_timeout: wrong timeout value ffffffffffffff83"

This patch does some check and correction in sysctl, to let the function
schedule_timeout_interruptible allways get the valid parameter.

Signed-off-by: Liu Hua <sdu.liu@huawei.com>
Tested-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06 07:55:32 -07:00
Oleg Nesterov
1673978155 exit: call disassociate_ctty() before exit_task_namespaces()
commit c39df5fa37 upstream.

Commit 8aac62706a ("move exit_task_namespaces() outside of
exit_notify()") breaks pppd and the exiting service crashes the kernel:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
    IP: ppp_register_channel+0x13/0x20 [ppp_generic]
    Call Trace:
      ppp_asynctty_open+0x12b/0x170 [ppp_async]
      tty_ldisc_open.isra.2+0x27/0x60
      tty_ldisc_hangup+0x1e3/0x220
      __tty_hangup+0x2c4/0x440
      disassociate_ctty+0x61/0x270
      do_exit+0x7f2/0xa50

ppp_register_channel() needs ->net_ns and current->nsproxy == NULL.

Move disassociate_ctty() before exit_task_namespaces(), it doesn't make
sense to delay it after perf_event_exit_task() or cgroup_exit().

This also allows to use task_work_add() inside the (nontrivial) code
paths in disassociate_ctty().

Investigated by Peter Hurley.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Sree Harsha Totakura <sreeharsha@totakura.in>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Sree Harsha Totakura <sreeharsha@totakura.in>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26 17:15:36 -07:00
Oleg Nesterov
a756e3d4c7 wait: fix reparent_leader() vs EXIT_DEAD->EXIT_ZOMBIE race
commit dfccbb5e49 upstream.

wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and
drops tasklist_lock.  If this task is not the natural child and it is
traced, we change its state back to EXIT_ZOMBIE for ->real_parent.

The last transition is racy, this is even documented in 50b8d25748
"ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE
race".  wait_consider_task() tries to detect this transition and clear
->notask_error but we can't rely on ptrace_reparented(), debugger can
exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.

And there is another problem which were missed before: this transition
can also race with reparent_leader() which doesn't reset >exit_signal if
EXIT_DEAD, assuming that this task must be reaped by someone else.  So
the tracee can be re-parented with ->exit_signal != SIGCHLD, and if
/sbin/init doesn't use __WALL it becomes unreapable.

Change reparent_leader() to update ->exit_signal even if EXIT_DEAD.
Note: this is the simple temporary hack for -stable, it doesn't try to
solve all problems, it will be reverted by the next changes.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Reported-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Michal Schmidt <mschmidt@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Lennart Poettering <lpoetter@redhat.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26 17:15:36 -07:00
Oleg Nesterov
a15c36b2de pid_namespace: pidns_get() should check task_active_pid_ns() != NULL
commit d23082257d upstream.

pidns_get()->get_pid_ns() can hit ns == NULL. This task_struct can't
go away, but task_active_pid_ns(task) is NULL if release_task(task)
was already called. Alternatively we could change get_pid_ns(ns) to
check ns != NULL, but it seems that other callers are fine.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26 17:15:34 -07:00
Mikulas Patocka
98afe6dfde user namespace: fix incorrect memory barriers
commit e79323bd87 upstream.

smp_read_barrier_depends() can be used if there is data dependency between
the readers - i.e. if the read operation after the barrier uses address
that was obtained from the read operation before the barrier.

In this file, there is only control dependency, no data dependecy, so the
use of smp_read_barrier_depends() is incorrect. The code could fail in the
following way:
* the cpu predicts that idx < entries is true and starts executing the
  body of the for loop
* the cpu fetches map->extent[0].first and map->extent[0].count
* the cpu fetches map->nr_extents
* the cpu verifies that idx < extents is true, so it commits the
  instructions in the body of the for loop

The problem is that in this scenario, the cpu read map->extent[0].first
and map->nr_extents in the wrong order. We need a full read memory barrier
to prevent it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26 17:15:34 -07:00
Alex Shi
dc65fae3f7 Merge tag 'v3.10.37' into linux-linaro-lsk
This is the 3.10.37 stable release
2014-04-15 09:57:50 +08:00
Heiko Carstens
f26c70a452 futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test
commit 03b8c7b623 upstream.

If an architecture has futex_atomic_cmpxchg_inatomic() implemented and there
is no runtime check necessary, allow to skip the test within futex_init().

This allows to get rid of some code which would always give the same result,
and also allows the compiler to optimize a couple of if statements away.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Link: http://lkml.kernel.org/r/20140302120947.GA3641@osiris
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[geert: Backported to v3.10..v3.13]
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-14 06:42:19 -07:00
Mark Brown
1859631f14 Merge branch 'v3.10/topic/big.LITTLE' of git://git.linaro.org/kernel/linux-linaro-stable into linux-linaro-lsk 2014-04-08 18:47:40 +01:00
Jon Medhurst
db3dba6818 Revert "hmp: sched: Clean up hmp_up_threshold checks into a utility fn"
This reverts commit 765aae26e6.

Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-04-08 16:43:25 +01:00
Jon Medhurst
11971ff25f Revert "sched: hmp: unify active migration code"
This reverts commit 0baa5811ba.

Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-04-08 16:43:24 +01:00
Jon Medhurst
7e1f7d3d4e Revert "hmp: Use idle pull to perform forced up-migrations"
This reverts commit aae7721f20.

Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-04-08 16:43:23 +01:00
Jon Medhurst
8503bfd723 Revert "hmp: dont attempt to pull tasks if affinity doesn't allow it"
This reverts commit 5a570cfc01.

Signed-off-by: Jon Medhurst <tixy@linaro.org>
2014-04-08 16:43:19 +01:00
Mark Brown
f3124ba519 Merge tag 'v3.10.35' into linux-linaro-lsk
This is the 3.10.35 stable release
2014-03-31 23:51:22 +01:00
Gerald Schaefer
ccdb5fa37f sched/autogroup: Fix race with task_groups list
commit 41261b6a83 upstream.

In autogroup_create(), a tg is allocated and added to the task_groups
list. If CONFIG_RT_GROUP_SCHED is set, this tg is then modified while on
the list, without locking. This can race with someone walking the list,
like __enable_runtime() during CPU unplug, and result in a use-after-free
bug.

To fix this, move sched_online_group(), which adds the tg to the list,
to the end of the autogroup_create() function after the modification.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1369411669-46971-2-git-send-email-gerald.schaefer@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-31 09:58:14 -07:00
Vaibhav Nagarnaik
a1c10a94ff tracing: Fix array size mismatch in format string
commit 87291347c4 upstream.

In event format strings, the array size is reported in two locations.
One in array subscript and then via the "size:" attribute. The values
reported there have a mismatch.

For e.g., in sched:sched_switch the prev_comm and next_comm character
arrays have subscript values as [32] where as the actual field size is
16.

name: sched_switch
ID: 301
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1;signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:char prev_comm[32];       offset:8;       size:16;        signed:1;
        field:pid_t prev_pid;   offset:24;      size:4; signed:1;
        field:int prev_prio;    offset:28;      size:4; signed:1;
        field:long prev_state;  offset:32;      size:8; signed:1;
        field:char next_comm[32];       offset:40;      size:16;        signed:1;
        field:pid_t next_pid;   offset:56;      size:4; signed:1;
        field:int next_prio;    offset:60;      size:4; signed:1;

After bisection, the following commit was blamed:
92edca0 tracing: Use direct field, type and system names

This commit removes the duplication of strings for field->name and
field->type assuming that all the strings passed in
__trace_define_field() are immutable. This is not true for arrays, where
the type string is created in event_storage variable and field->type for
all array fields points to event_storage.

Use __stringify() to create a string constant for the type string.

Also, get rid of event_storage and event_storage_mutex that are not
needed anymore.

also, an added benefit is that this reduces the overhead of events a bit more:

   text    data     bss     dec     hex filename
8424787 2036472 1302528 11763787         b3804b vmlinux
8420814 2036408 1302528 11759750         b37086 vmlinux.patched

Link: http://lkml.kernel.org/r/1392349908-29685-1-git-send-email-vnagarnaik@google.com

Cc: Laurent Chavey <chavey@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-31 09:58:12 -07:00
Mark Brown
7dcba49cf6 Merge branch 'for-lsk' of git://git.linaro.org/arm/big.LITTLE/mp into linux-linaro-lsk 2014-03-31 16:36:42 +01:00
Alex Shi
02e11cd173 Merge tag 'v3.10.34' into linux-linaro-lsk
This is 3.10.34 stable release
2014-03-28 10:58:52 +08:00