Commit Graph

413408 Commits

Author SHA1 Message Date
Wanpeng Li de1b301a19 sched/numa: Use wrapper function task_node to get node which task is on
Use wrapper function task_node to get node which task is on.

Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1386833006-6600-2-git-send-email-liwanp@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-17 15:24:39 +01:00
Wanpeng Li 1bd53a7efd sched/numa: Drop sysctl_numa_balancing_settle_count sysctl
commit 887c290e (sched/numa: Decide whether to favour task or group weights
based on swap candidate relationships) drop the check against
sysctl_numa_balancing_settle_count, this patch remove the sysctl.

Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Link: http://lkml.kernel.org/r/1386833006-6600-1-git-send-email-liwanp@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-17 15:24:38 +01:00
Ingo Molnar ffe732c243 Merge branch 'sched/urgent' into sched/core
Merge the latest batch of fixes before applying development patches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-17 15:22:35 +01:00
Kirill Tkhai 757dfcaa41 sched/rt: Fix rq's cpupri leak while enqueue/dequeue child RT entities
This patch touches the RT group scheduling case.

Functions inc_rt_prio_smp() and dec_rt_prio_smp() change (global) rq's
priority, while rt_rq passed to them may be not the top-level rt_rq.
This is wrong, because changing of priority on a child level does not
guarantee that the priority is the highest all over the rq. So, this
leak makes RT balancing unusable.

The short example: the task having the highest priority among all rq's
RT tasks (no one other task has the same priority) are waking on a
throttle rt_rq.  The rq's cpupri is set to the task's priority
equivalent, but real rq->rt.highest_prio.curr is less.

The patch below fixes the problem.

Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/49231385567953@web4m.yandex.ru
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-17 15:08:44 +01:00
Mel Gorman 5d4cf996cf sched: Assign correct scheduling domain to 'sd_llc'
Commit 42eb088e (sched: Avoid NULL dereference on sd_busy) corrected a NULL
dereference on sd_busy but the fix also altered what scheduling domain it
used for the 'sd_llc' percpu variable.

One impact of this is that a task selecting a runqueue may consider
idle CPUs that are not cache siblings as candidates for running.
Tasks are then running on CPUs that are not cache hot.

This was found through bisection where ebizzy threads were not seeing equal
performance and it looked like a scheduling fairness issue. This patch
mitigates but does not completely fix the problem on all machines tested
implying there may be an additional bug or a common root cause. Here are
the average range of performance seen by individual ebizzy threads. It
was tested on top of candidate patches related to x86 TLB range flushing.

	4-core machine
			    3.13.0-rc3            3.13.0-rc3
			       vanilla            fixsd-v3r3
	Mean   1        0.00 (  0.00%)        0.00 (  0.00%)
	Mean   2        0.34 (  0.00%)        0.10 ( 70.59%)
	Mean   3        1.29 (  0.00%)        0.93 ( 27.91%)
	Mean   4        7.08 (  0.00%)        0.77 ( 89.12%)
	Mean   5      193.54 (  0.00%)        2.14 ( 98.89%)
	Mean   6      151.12 (  0.00%)        2.06 ( 98.64%)
	Mean   7      115.38 (  0.00%)        2.04 ( 98.23%)
	Mean   8      108.65 (  0.00%)        1.92 ( 98.23%)

	8-core machine
	Mean   1         0.00 (  0.00%)        0.00 (  0.00%)
	Mean   2         0.40 (  0.00%)        0.21 ( 47.50%)
	Mean   3        23.73 (  0.00%)        0.89 ( 96.25%)
	Mean   4        12.79 (  0.00%)        1.04 ( 91.87%)
	Mean   5        13.08 (  0.00%)        2.42 ( 81.50%)
	Mean   6        23.21 (  0.00%)       69.46 (-199.27%)
	Mean   7        15.85 (  0.00%)      101.72 (-541.77%)
	Mean   8       109.37 (  0.00%)       19.13 ( 82.51%)
	Mean   12      124.84 (  0.00%)       28.62 ( 77.07%)
	Mean   16      113.50 (  0.00%)       24.16 ( 78.71%)

It's eliminated for one machine and reduced for another.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: H Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131217092124.GV11295@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-17 15:08:43 +01:00
Peter Zijlstra 9dbdb15553 sched/fair: Rework sched_fair time accounting
Christian suffers from a bad BIOS that wrecks his i5's TSC sync. This
results in him occasionally seeing time going backwards - which
crashes the scheduler ...

Most of our time accounting can actually handle that except the most
common one; the tick time update of sched_fair.

There is a further problem with that code; previously we assumed that
because we get a tick every TICK_NSEC our time delta could never
exceed 32bits and math was simpler.

However, ever since Frederic managed to get NO_HZ_FULL merged; this is
no longer the case since now a task can run for a long time indeed
without getting a tick. It only takes about ~4.2 seconds to overflow
our u32 in nanoseconds.

This means we not only need to better deal with time going backwards;
but also means we need to be able to deal with large deltas.

This patch reworks the entire code and uses mul_u64_u32_shr() as
proposed by Andy a long while ago.

We express our virtual time scale factor in a u32 multiplier and shift
right and the 32bit mul_u64_u32_shr() implementation reduces to a
single 32x32->64 multiply if the time delta is still short (common
case).

For 64bit a 64x64->128 multiply can be used if ARCH_SUPPORTS_INT128.

Reported-and-Tested-by: Christian Engelmayer <cengelma@gmx.at>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: fweisbec@gmail.com
Cc: Paul Turner <pjt@google.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131118172706.GI3866@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-11 15:52:35 +01:00
Peter Zijlstra be5e610c0f math64: Add mul_u64_u32_shr()
Introduce mul_u64_u32_shr() as proposed by Andy a while back; it
allows using 64x64->128 muls on 64bit archs and recent GCC
which defines __SIZEOF_INT128__ and __int128.

(This new method will be used by the scheduler.)

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: fweisbec@gmail.com
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-hxjoeuzmrcaumR0uZwjpe2pv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-11 15:52:34 +01:00
Peter Zijlstra ba1f14fbe7 sched: Remove PREEMPT_NEED_RESCHED from generic code
While hunting a preemption issue with Alexander, Ben noticed that the
currently generic PREEMPT_NEED_RESCHED stuff is horribly broken for
load-store architectures.

We currently rely on the IPI to fold TIF_NEED_RESCHED into
PREEMPT_NEED_RESCHED, but when this IPI lands while we already have
a load for the preempt-count but before the store, the store will erase
the PREEMPT_NEED_RESCHED change.

The current preempt-count only works on load-store archs because
interrupts are assumed to be completely balanced wrt their preempt_count
fiddling; the previous preempt_count load will match the preempt_count
state after the interrupt and therefore nothing gets lost.

This patch removes the PREEMPT_NEED_RESCHED usage from generic code and
pushes it into x86 arch code; the generic code goes back to relying on
TIF_NEED_RESCHED.

Boot tested on x86_64 and compile tested on ppc64.

Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reported-and-Tested-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131128132641.GP10022@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-11 15:52:32 +01:00
Peter Zijlstra 8e8339a3a1 sched: Initialize power_orig for overlapping groups
Yinghai reported that he saw a /0 in sg_capacity on his EX parts.
Make sure to always initialize power_orig now that we actually use it.

Ideally build_sched_domains() -> init_sched_groups_power() would also
initialize this; but for some yet unexplained reason some setups seem
to miss updates there.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-l8ng2m9uml6fhibln8wqpom7@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-11 15:47:59 +01:00
Linus Torvalds 17b2112f33 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
 "Here are a handful of powerpc fixes for 3.13.

  The patches are reasonably trivial and self contained.  Note the offb
  patches outside of arch/powerpc, they are LE fixes for our
  open-firmware 'dumb' framebuffer"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Fix up the kdump base cap to 128M
  powernv: Fix VFIO support with PHB3
  powerpc/52xx: Re-enable bestcomm driver in defconfigs
  powerpc/pasemi: Turn on devtmpfs in defconfig
  offb: Add palette hack for little endian
  offb: Little endian fixes
  powerpc: Fix PTE page address mismatch in pgtable ctor/dtor
  powerpc/44x: Fix ocm_block allocation
  powerpc: Fix build break with PPC_EARLY_DEBUG_BOOTX=y
  powerpc/512x: dts: remove misplaced IRQ spec from 'soc' node
2013-12-09 19:21:39 -08:00
Mahesh Salgaonkar e641eb03ab powerpc: Fix up the kdump base cap to 128M
The current logic sets the kdump base to min of 2G or ppc64_rma_size/2.
On PowerNV kernel the first memory block 'memory@0' can be very large,
equal to the DIMM size with ppc64_rma_size value capped to 1G. Hence on
PowerNV, kdump base is set to 512M resulting kdump to fail while allocating
paca array. This is because, paca need its memory from RMA region capped
at 256M (see allocate_pacas()).

This patch lowers the kdump base cap to 128M so that kdump kernel can
successfully get memory below 256M for paca allocation.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-10 11:28:39 +11:00
Thadeu Lima de Souza Cascardo 08607afba6 powernv: Fix VFIO support with PHB3
I have recently found out that no iommu_groups could be found under
/sys/ on a P8. That prevents PCI passthrough from working.

During my investigation, I found out there seems to be a missing
iommu_register_group for PHB3. The following patch seems to fix the
problem. After applying it, I see iommu_groups under
/sys/kernel/iommu_groups/, and can also bind vfio-pci to an adapter,
which gives me a device at /dev/vfio/.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-10 11:28:38 +11:00
Anatolij Gustschin 84953f969b powerpc/52xx: Re-enable bestcomm driver in defconfigs
The bestcomm driver has been moved to drivers/dma, so to select
this driver by default additionally CONFIG_DMADEVICES has to be
enabled. Currently it is not enabled in the config despite existing
CONFIG_PPC_BESTCOMM=y in the config files. Fix it.

Signed-off-by: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-10 11:25:08 +11:00
Olof Johansson fbae00e63d powerpc/pasemi: Turn on devtmpfs in defconfig
At least some distros expect it these days; turn it on. Also, random
churn from doing a savedefconfig for the first time in a year or so.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-10 11:25:08 +11:00
Cedric Le Goater e1edf18b20 offb: Add palette hack for little endian
The pseudo palette color entries need to be ajusted for little
endian.

This patch byteswaps the values in the pseudo palette depending
on the host endian order and the screen depth.

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-10 11:25:07 +11:00
Cedric Le Goater 212c0cbd5b offb: Little endian fixes
The "screen" properties : depth, width, height, linebytes need
to be converted to the host endian order when read from the device
tree.

The offb_init_palette_hacks() routine also made assumption on the
host endian order.

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-10 11:25:06 +11:00
Hong H. Pham cf77ee5436 powerpc: Fix PTE page address mismatch in pgtable ctor/dtor
In pte_alloc_one(), pgtable_page_ctor() is passed an address that has
not been converted by page_address() to the newly allocated PTE page.

When the PTE is freed, __pte_free_tlb() calls pgtable_page_dtor()
with an address to the PTE page that has been converted by page_address().
The mismatch in the PTE's page address causes pgtable_page_dtor() to access
invalid memory, so resources for that PTE (such as the page lock) is not
properly cleaned up.

On PPC32, only SMP kernels are affected.

On PPC64, only SMP kernels with 4K page size are affected.

This bug was introduced by commit d614bb0412
"powerpc: Move the pte free routines from common header".

On a preempt-rt kernel, a spinlock is dynamically allocated for each
PTE in pgtable_page_ctor().  When the PTE is freed, calling
pgtable_page_dtor() with a mismatched page address causes a memory leak,
as the pointer to the PTE's spinlock is bogus.

On mainline, there isn't any immediately obvious symptoms, but the
problem still exists here.

Fixes: d614bb0412 "powerpc: Move the pte free routes from common header"
Cc: Paul Mackerras <paulus@samba.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux-stable <stable@vger.kernel.org> # v3.10+
Signed-off-by: Hong H. Pham <hong.pham@windriver.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-10 11:25:05 +11:00
Ilia Mirkin 1b429835be powerpc/44x: Fix ocm_block allocation
Allocate enough memory for the ocm_block structure, not just a pointer
to it.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-10 11:25:04 +11:00
Michael Ellerman 2d6f0c3ae6 powerpc: Fix build break with PPC_EARLY_DEBUG_BOOTX=y
A kernel configured with PPC_EARLY_DEBUG_BOOTX=y but PPC_PMAC=n and
PPC_MAPLE=n will fail to link:

  btext.c:(.text+0x2d0fc): undefined reference to `.rmci_off'
  btext.c:(.text+0x2d214): undefined reference to `.rmci_on'

Fix it by making the build of rmci_on/off() depend on
PPC_EARLY_DEBUG_BOOTX, which also enable the only code that uses them.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-10 11:25:03 +11:00
Benjamin Herrenschmidt e80ba46120 Merge remote-tracking branch 'agust/merge' into merge
Anatolij Gustschin says:

<<
Please pull a device tree fix for v3.13. The booting on mpc512x
is broken since v3.13-rc1, this patch repairs it.
>>
2013-12-10 11:24:10 +11:00
Linus Torvalds 78fd82238d Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "This is probably a bit big, but just because I fell behind last week
  and didn't get to doing any pulls, so stuff backed up behind me, I
  actually should have sent this for -rc3 but failed to even manage
  that.

  So this has radeon, intel, nouveau, vmware, exynos and tegra fixes in
  it, and the line count isn't all the bad in the end"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (50 commits)
  drm: fix the addition of the side-by-side (half) flag for extra 3D modes
  drm/edid: fix length check when adding extra 3D modes
  drm/radeon/atom: fix bus probes when hw_i2c is set (v2)
  drm/radeon: fix null pointer dereference in dce6+ audio code
  drm/radeon: fixup bad vram size on SI
  udl: fix issue with imported prime buffers
  drm/vmwgfx: Add our connectors to sysfs
  drm/vmwgfx: Fix dma buffer memory size accounting
  drm/vmwgfx: Fix up and comment the dumb buffer implementation
  drm/vmwgfx: Correctly set the enabled state on crtcs
  drm/nv50/disp: min/max are reversed in nv50_crtc_gamma_set()
  drm/nouveau/sw: fix oops if gpu has its display block disabled
  drm/nouveau: unreference fence after syncing
  drm/nouveau/kms: send timestamp data for correct head in flip completion events
  drm/nouveau/clk: Add support for NVAA/NVAC
  drm/nouveau/fifo: Hook up pause and resume for NV50 and NV84+
  drm/nv10/plane: some chipsets don't support NV12
  drm/nv10/plane: add downscaling restrictions
  drm/nv10/plane: fix format computation
  drm/nv04-nv30/clk: provide an empty domain list
  ...
2013-12-09 09:43:07 -08:00
Linus Torvalds 32ac486967 Merge tag 'mfd-fixes-3.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-fixes
Pull mfd fixes from Samuel Ortiz:
 "This is the first 3.13 pull request for MFD fixes.  We have:

   - A ti-ssp build failure fix
   - An as3722 build failure fix
   - An lpc_ich copy paste error fix"

* tag 'mfd-fixes-3.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-fixes:
  mfd: lpc_ich: Fix Wildcat Point info name field
  mfd: ti-ssp: Fix build
  mfd: Make MFD_AS3722 depend on I2C=y
2013-12-09 09:30:21 -08:00
Linus Torvalds 77bd2adb97 Merge tag 'pm-3.13-rc3-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixup from Rafael Wysocki:
 "This reverts two cpufreq commits that fixed issues for some people,
  but broke things for others, so revert them and we'll need to fix the
  original problems differently"

* tag 'pm-3.13-rc3-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "cpufreq: fix garbage kobjects on errors during suspend/resume"
  Revert "cpufreq: suspend governors on system suspend/hibernate"
2013-12-09 09:29:42 -08:00
Linus Torvalds 754ac45745 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
 "An update to ALPS to support devices on Dell XT2 (hopefully working
  better this time around and although it is largish it should not
  affect any other ALPS devices) and a tiny update to Elantech driver to
  support newer devices as well.

  Also a coupe of new input event codes have been defined"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: ALPS - add support for DualPoint device on Dell XT2 model
  Input: elantech - add support for newer (August 2013) devices
  Input: add SW_MUTE_DEVICE switch definition
  Input: usbtouchscreen - separate report and transmit buffer size handling
  Input: sur40 - suppress false uninitialized variable warning
  Input: add key code for ambient light sensor button
  Input: keyboard - "keycode & KEY_MAX" changes some keycode values
2013-12-09 09:28:31 -08:00
Linus Torvalds b6db4eb02e Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "Unfortunately the last push that fixed a crash in the crypto
  scatterwalk code introduced a new crash when SG debugging is enabled.
  This fixes that"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: scatterwalk - Use sg_chain_ptr on chain entries
2013-12-09 09:27:47 -08:00