The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.
After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.
Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
are flagged as __cpuinit -- so if we remove the __cpuinit from
arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
content into no-ops as early as possible, since that will get rid
of these warnings. In any case, they are temporary and harmless.
This removes all the arch/metag uses of the __cpuinit macros from
all C files. Currently metag does not have any __CPUINIT used in
assembly files.
[1] https://lkml.org/lkml/2013/5/20/589
Cc: James Hogan <james.hogan@imgtec.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Pull arch/metag fixes from James Hogan:
"This is just a single fix to fix bad UDP checksums sometimes being
generated to IP addresses *.*.255.255"
* tag 'metag-fixes-for-v3.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
metag: checksum.h: fix carry in csum_tcpudp_nofold
A few remaining architectures directly kill the page faulting task in an
out of memory situation. This is usually not a good idea since that
task might not even use a significant amount of memory and so may not be
the optimal victim to resolve the situation.
Since 2.6.29's 1c0fe6e ("mm: invoke oom-killer from page fault") there
is a hook that architecture page fault handlers are supposed to call to
invoke the OOM killer and let it pick the right task to kill. Convert
the remaining architectures over to this hook.
To have the previous behavior of simply taking out the faulting task the
vm.oom_kill_allocating_task sysctl can be set to 1.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc bits]
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In csum_tcpudp_nofold, add 1 if the carry bit is set after adding the
destination IP address (32 bits) to the checksum (16 bits).
The lack of carry handling for this particular addition meant that a
destination address of *.*.255.255 (e.g. certain broadcasts) sometimes
resulted in an incorrect checksum. This bug has been present in the Meta
port since the code was written in the 2.4 days.
Reported-by: Marcin Nowakowski <Marcin.Nowakowski@pure.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Pull Metag architecture changes from James Hogan:
- Infrastructure and DT files for TZ1090 SoC (pin control drivers
already merged via pinctrl tree).
- Panic on boot instead of just warning if cache aliasing possible.
- Various SMP/hotplug fixes.
- Various other randconfig/sparse fixes.
* tag 'metag-for-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag: (24 commits)
metag: move EXPORT_SYMBOL(csum_partial) to metag_ksyms.c
metag: cpu hotplug: route_irq: preserve irq mask
metag: kick: add missing irq_enter/exit to kick_handler()
metag: smp: don't spin waiting for CPU to start
metag: smp: enable irqs after set_cpu_online
metag: use clear_tasks_mm_cpumask()
metag: tz1090: select and instantiate pinctrl-tz1090-pdc
metag: tz1090: select and instantiate pinctrl-tz1090
metag: don't check for cache aliasing on smp cpu boot
metag: panic if cache aliasing possible
metag: *.dts: include using preprocessor
metag: add <dt-bindings/> symlink
metag/.gitignore: Extend the *.dtb pattern to match the dtb.S files
metag/traps: include setup.h for the per_cpu_trap_init declaration
metag/traps: Mark die() as __noreturn to match the declaration.
metag/processor.h: Add missing cpuinfo_op declaration.
metag/setup: Restrict scope for the capabilities variable
metag/mm/cache: Restrict scope for metag_lnkget_probe
metag/asm/irq.h: Declare init_IRQ
metag/kernel/irq.c: Declare root_domain as static
...
Merge Kconfig menu diet patches from Dave Hansen:
"I think the "Kernel Hacking" menu has gotten a bit out of hand. It is
over 120 lines long on my system with everything enabled and options
are scattered around it haphazardly.
http://sr71.net/~dave/linux/kconfig-horror.png
Let's try to introduce some sanity. This set takes that 120 lines
down to 55 and makes it vastly easier to find some things. It's a
start.
This set stands on its own, but there is plenty of room for follow-up
patches. The arch-specific debug options still end up getting stuck
in the top-level "kernel hacking" menu. OPTIMIZE_INLINING, for
instance, could obviously go in to the "compiler options" menu, but
the fact that it is defined in arch/ in a separate Kconfig file keeps
it on its own for the moment.
The Signed-off-by's in here look funky. I changed employers while
working on this set, so I have signoffs from both email addresses"
* emailed patches from Dave Hansen <dave@sr71.net>:
hang and lockup detection menu
kconfig: consolidate printk options
group locking debugging options
consolidate compilation option configs
consolidate runtime testing configs
order memory debugging Kconfig options
consolidate per-arch stack overflow debugging options
Move EXPORT_SYMBOL(csum_partial) from lib/checksum.c into metag_ksyms.c
so that it doesn't get omitted by the static linker if it's not used by
any other statically linked code, which can result in undefined symbols
when building modules.
For example a randconfig caused the following error:
ERROR: "csum_partial" [fs/reiserfs/reiserfs.ko] undefined!
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Change signature of free_reserved_area() according to Russell King's
suggestion to fix following build warnings:
arch/arm/mm/init.c: In function 'mem_init':
arch/arm/mm/init.c:603:2: warning: passing argument 1 of 'free_reserved_area' makes integer from pointer without a cast [enabled by default]
free_reserved_area(__va(PHYS_PFN_OFFSET), swapper_pg_dir, 0, NULL);
^
In file included from include/linux/mman.h:4:0,
from arch/arm/mm/init.c:15:
include/linux/mm.h:1301:22: note: expected 'long unsigned int' but argument is of type 'void *'
extern unsigned long free_reserved_area(unsigned long start, unsigned long end,
mm/page_alloc.c: In function 'free_reserved_area':
>> mm/page_alloc.c:5134:3: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [enabled by default]
In file included from arch/mips/include/asm/page.h:49:0,
from include/linux/mmzone.h:20,
from include/linux/gfp.h:4,
from include/linux/mm.h:8,
from mm/page_alloc.c:18:
arch/mips/include/asm/io.h:119:29: note: expected 'const volatile void *' but argument is of type 'long unsigned int'
mm/page_alloc.c: In function 'free_area_init_nodes':
mm/page_alloc.c:5030:34: warning: array subscript is below array bounds [-Warray-bounds]
Also address some minor code review comments.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: <sworddragon2@aol.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull perf updates from Ingo Molnar:
"Kernel improvements:
- watchdog driver improvements by Li Zefan
- Power7 CPI stack events related improvements by Sukadev Bhattiprolu
- event multiplexing via hrtimers and other improvements by Stephane
Eranian
- kernel stack use optimization by Andrew Hunter
- AMD IOMMU uncore PMU support by Suravee Suthikulpanit
- NMI handling rate-limits by Dave Hansen
- various hw_breakpoint fixes by Oleg Nesterov
- hw_breakpoint overflow period sampling and related signal handling
fixes by Jiri Olsa
- Intel Haswell PMU support by Andi Kleen
Tooling improvements:
- Reset SIGTERM handler in workload child process, fix from David
Ahern.
- Makefile reorganization, prep work for Kconfig patches, from Jiri
Olsa.
- Add automated make test suite, from Jiri Olsa.
- Add --percent-limit option to 'top' and 'report', from Namhyung
Kim.
- Sorting improvements, from Namhyung Kim.
- Expand definition of sysfs format attribute, from Michael Ellerman.
Tooling fixes:
- 'perf tests' fixes from Jiri Olsa.
- Make Power7 CPI stack events available in sysfs, from Sukadev
Bhattiprolu.
- Handle death by SIGTERM in 'perf record', fix from David Ahern.
- Fix printing of perf_event_paranoid message, from David Ahern.
- Handle realloc failures in 'perf kvm', from David Ahern.
- Fix divide by 0 in variance, from David Ahern.
- Save parent pid in thread struct, from David Ahern.
- Handle JITed code in shared memory, from Andi Kleen.
- Fixes for 'perf diff', from Jiri Olsa.
- Remove some unused struct members, from Jiri Olsa.
- Add missing liblk.a dependency for python/perf.so, fix from Jiri
Olsa.
- Respect CROSS_COMPILE in liblk.a, from Rabin Vincent.
- No need to do locking when adding hists in perf report, only 'top'
needs that, from Namhyung Kim.
- Fix alignment of symbol column in in the hists browser (top,
report) when -v is given, from NAmhyung Kim.
- Fix 'perf top' -E option behavior, from Namhyung Kim.
- Fix bug in isupper() and islower(), from Sukadev Bhattiprolu.
- Fix compile errors in bp_signal 'perf test', from Sukadev
Bhattiprolu.
... and more things"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (102 commits)
perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
perf/x86: Fix shared register mutual exclusion enforcement
perf/x86/intel: Support full width counting
x86: Add NMI duration tracepoints
perf: Drop sample rate when sampling is too slow
x86: Warn when NMI handlers take large amounts of time
hw_breakpoint: Introduce "struct bp_cpuinfo"
hw_breakpoint: Simplify *register_wide_hw_breakpoint()
hw_breakpoint: Introduce cpumask_of_bp()
hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
perf/x86/intel: Add mem-loads/stores support for Haswell
perf/x86/intel: Support Haswell/v4 LBR format
perf/x86/intel: Move NMI clearing to end of PMI handler
perf/x86/intel: Add Haswell PEBS support
perf/x86/intel: Add simple Haswell PMU support
perf/x86/intel: Add Haswell PEBS record support
perf/x86/intel: Fix sparse warning
perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
perf/x86/amd: Add IOMMU Performance Counter resource management
...
Pull VFS patches (part 1) from Al Viro:
"The major change in this pile is ->readdir() replacement with
->iterate(), dealing with ->f_pos races in ->readdir() instances for
good.
There's a lot more, but I'd prefer to split the pull request into
several stages and this is the first obvious cutoff point."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (67 commits)
[readdir] constify ->actor
[readdir] ->readdir() is gone
[readdir] convert ecryptfs
[readdir] convert coda
[readdir] convert ocfs2
[readdir] convert fatfs
[readdir] convert xfs
[readdir] convert btrfs
[readdir] convert hostfs
[readdir] convert afs
[readdir] convert ncpfs
[readdir] convert hfsplus
[readdir] convert hfs
[readdir] convert befs
[readdir] convert cifs
[readdir] convert freevxfs
[readdir] convert fuse
[readdir] convert hpfs
reiserfs: switch reiserfs_readdir_dentry to inode
reiserfs: is_privroot_deh() needs only directory inode, actually
...
The route_irq() function needs to preserve the irq mask by using the
_irqsave/irqrestore variants of raw spin lock functions instead of the
_irq variants. This is because it is called from __cpu_disable() (via
migrate_irqs()), which is called with IRQs disabled, so using the _irq
variants re-enables IRQs.
This appears to have been causing occasional hits of the
BUG_ON(!irqs_disabled()) in __irq_work_run() during CPU hotplug soak
testing:
BUG: failure at kernel/irq_work.c:122/__irq_work_run()!
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
kick_handler() doesn't have an irq_enter/exit pair, but it's used for
handling SMP IPIs which require work to be done in softirqs, which are
invoked from irq_exit() when the hard irq nest count reaches 0.
The scheduler_ipi() callback in the IPI handler calls irq_enter/exit
itself, but this is inside kick_handler()'s spin lock critical section,
so if an invoked softirq issues an IPI the kick_handler() will be
re-entered on the same CPU and will deadlock.
This is easily fixed by adding the missing irq_enter/exit to
kick_handler() so that the hard irq nest count doesn't reach 0 until
after the spin lock has been released.
Ideally the spin lock protected handler list will also be replaced by a
lockless RCU protected list since it is certainly mostly read. That can
be done in a later change though.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Use a completion to block until a secondary CPU has started up, like ARM
do, instead of a loop of udelays.
On Meta, SMP is really SMT, with each "CPU" being a different hardware
thread on the same Meta processor core, so as well as being more
efficient and latency friendly, using a completion prevents the bogomips
of the secondary CPU from being drastically skewed every time by the
execution of the tight in-cache udelay loop on the other CPU.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
In secondary_start_kernel() interrupts should be enabled with
local_irq_enable() after the cpu is marked as online with
set_cpu_online(). Otherwise it's possible for a timer interrupt to
trigger a softirq, which if the cpu is marked as offline may have it's
affinity altered.
Reported-by: Kirill Tkhai <tkhai@yandex.ru>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Checking for process->mm is not enough because process' main thread may
exit or detach its mm via use_mm(), but other threads may still have a
valid mm.
To fix this we would need to use find_lock_task_mm(), which would walk
up all threads and returns an appropriate task (with task lock held).
clear_tasks_mm_cpumask() was introduced in v3.5-rc1 to fix this issue,
so let's use it for metag too.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Select PINCTRL_TZ1090_PDC from SOC_TZ1090 to enable the PDC pin
controller driver once it is merged, and instantiate it from
tz1090.dtsi.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Grant Likely <grant.likely@linaro.org>
Select PINCTRL and PINCTRL_TZ1090 from SOC_TZ1090 to enable the main pin
controller driver once it is merged, and instantiate it from
tz1090.dtsi.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Grant Likely <grant.likely@linaro.org>
The cache configuration of the boot cpu is now duplicated to secondary
cpus, so there's no need to check for cache aliasing again when a
secondary cpu is booted. Therefore remove the check from
secondary_start_kernel().
Signed-off-by: James Hogan <james.hogan@imgtec.com>
If the cache and page size configuration allows for cache aliasing to
occur we warn on boot, but the log messages are easy to miss and will
result is random crashes occuring in userland. Let's panic too in this
case so that the user immediately knows they need to fix the cache
configuration or configured page size.
Also fix the warning messages which display the cache and page sizes to
include newlines, and add the word "Potential" since an actual cache
alias hasn't been detected.
Signed-off-by: James Hogan <james.hogan@imgtec.com>