The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.
After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.
This removes all the powerpc uses of the __cpuinit macros. There
are no __CPUINIT users in assembly files in powerpc.
[1] https://lkml.org/lkml/2013/5/20/589
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We are registering the attribute with permission 0600 but it
doesn't have a store callback, which causes WARN_ON's during
boot. Fix the permission.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
smt_snooze_delay was designed to delay idle loop's nap entry
in the native idle code before it got ported over to use as part of
the cpuidle framework.
A -ve value assigned to smt_snooze_delay should result in
busy looping, in other words disabling the entry to nap state.
- https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-May/082450.html
This particular functionality can be achieved currently by
echo 1 > /sys/devices/system/cpu/cpu*/state1/disable
but it is broken when one assigns -ve value to the smt_snooze_delay
variable either via sysfs entry or ppc64_cpu util.
This patch aims to fix this, by disabling nap state when smt_snooze_delay
variable is set to -ve value.
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we update the DSCR either via emulation of mtspr(DSCR) or via
a change to dscr_default in sysfs we don't update thread.dscr.
We will eventually update it at context switch time but there is
a period where thread.dscr is incorrect.
If we fork at this point we will copy the old value of thread.dscr
into the child. To avoid this, always keep thread.dscr in sync with
reality.
This issue was found with the following testcase:
http://ozlabs.org/~anton/junkcode/dscr_inherit_test.c
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> # 3.0+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Writing to dscr_default in sysfs doesn't actually change the DSCR -
we rely on a context switch on each CPU to do the work. There is no
guarantee we will get a context switch in a reasonable amount of time
so fire off an IPI to force an immediate change.
This issue was found with the following test case:
http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> # 3.0+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is no longer selectable, so just remove all the dependent code.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits)
arm: fix up some samsung merge sysdev conversion problems
firmware: Fix an oops on reading fw_priv->fw in sysfs loading file
Drivers:hv: Fix a bug in vmbus_driver_unregister()
driver core: remove __must_check from device_create_file
debugfs: add missing #ifdef HAS_IOMEM
arm: time.h: remove device.h #include
driver-core: remove sysdev.h usage.
clockevents: remove sysdev.h
arm: convert sysdev_class to a regular subsystem
arm: leds: convert sysdev_class to a regular subsystem
kobject: remove kset_find_obj_hinted()
m86k: gpio - convert sysdev_class to a regular subsystem
mips: txx9_sram - convert sysdev_class to a regular subsystem
mips: 7segled - convert sysdev_class to a regular subsystem
sh: dma - convert sysdev_class to a regular subsystem
sh: intc - convert sysdev_class to a regular subsystem
power: suspend - convert sysdev_class to a regular subsystem
power: qe_ic - convert sysdev_class to a regular subsystem
power: cmm - convert sysdev_class to a regular subsystem
s390: time - convert sysdev_class to a regular subsystem
...
Fix up conflicts with 'struct sysdev' removal from various platform
drivers that got changed:
- arch/arm/mach-exynos/cpu.c
- arch/arm/mach-exynos/irq-eint.c
- arch/arm/mach-s3c64xx/common.c
- arch/arm/mach-s3c64xx/cpu.c
- arch/arm/mach-s5p64x0/cpu.c
- arch/arm/mach-s5pv210/common.c
- arch/arm/plat-samsung/include/plat/cpu.h
- arch/powerpc/kernel/sysfs.c
and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
This moves the 'memory sysdev_class' over to a regular 'memory' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch implements a back-end cpuidle driver for pSeries
based on pseries_dedicated_idle_loop and pseries_shared_idle_loop
routines. The driver is built only if CONFIG_CPU_IDLE is set. This
cpuidle driver uses global registration of idle states and
not per-cpu.
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Signed-off-by: Arun R Bharadwaj <arun.r.bharadwaj@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On Fri, Nov 11, 2011 at 10:17:55AM +0530, Ananth N Mavinakayanahalli wrote:
> >
> > At this rate we're going to end up with no bits left for CPU features
> > way too quickly... Especially for something we only care about once at
> > boot time.
> >
> > Wouldn't CPU_FTR_PPCAS_ARCH_V2 be a good enough test ?
>
> /me checks Cell manuals... yes, that test would be good enough. I will
> cook up a patch to use this.
Here it is...
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
All these files were including module.h just for the basic
EXPORT_SYMBOL infrastructure. We can shift them off to the
export.h header which is a way smaller footprint and thus
realize some compile time gains.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The DSCR (aka Data Stream Control Register) is supported on some
server PowerPC chips and allow some control over the prefetch
of data streams.
This patch allows the value to be specified per thread by emulating
the corresponding mfspr and mtspr instructions. Children of such
threads inherit the value. Other threads use a default value that
can be specified in sysfs - /sys/devices/system/cpu/dscr_default.
If a thread starts with non default value in the sysfs entry,
all children threads inherit this non default value even if
the sysfs value is changed later.
Signed-off-by: Alexey Kardashevskiy <aik@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Right now if we want to busy loop and not give up any time to the hypervisor
we put a very large value into smt_snooze_delay. This is sometimes useful
when running a single partition and you want to avoid any latencies due
to the hypervisor or CPU power state transitions. While this works, it's a bit
ugly - how big a number is enough now we have NO_HZ and can be idle for a very
long time.
The patch below makes smt_snooze_delay signed, and a negative value means loop
forever:
echo -1 > /sys/devices/system/cpu/cpu0/smt_snooze_delay
This change shouldn't affect the existing userspace tools (eg ppc64_cpu), but
I'm cc-ing Nathan just to be sure.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
I'm not sure why we have code for parsing an ibm,smt-snooze-delay OF
property. Since we have a smt-snooze-delay= boot option and we can
also set it at runtime via sysfs, it should be safe to get rid of
this code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Version 3 of this patch is updated with documentation added to
Documentation/ABI. There are no changes to any of the C code from v2
of the patch.
In order to support kernel DLPAR of CPU resources we need to provide an
interface to add (probe) and remove (release) the resource from the system.
This patch Creates new generic probe and release sysfs files to facilitate
cpu probe/release. The probe/release interface provides for allowing each
arch to supply their own routines for implementing the backend of adding
and removing cpus to/from the system.
This also creates the powerpc specific stubs to handle the arch callouts
from writes to the sysfs files.
The creation and use of these files is regulated by the
CONFIG_ARCH_CPU_PROBE_RELEASE option so that only architectures that need the
capability will have the files created.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently there is a bug where if you use oprofile on a pSeries
machine, then use perf_counters, then use oprofile again, oprofile
will not work correctly; it will lose the PMU configuration the next
time the hypervisor does a partition context switch, and thereafter
won't count anything.
Maynard Johnson identified the sequence causing the problem:
- oprofile setup calls ppc_enable_pmcs(), which calls
pseries_lpar_enable_pmcs, which tells the hypervisor that we want
to use the PMU, and sets the "PMU in use" flag in the lppaca.
This flag tells the hypervisor whether it needs to save and restore
the PMU config.
- The perf_counter code sets and clears the "PMU in use" flag directly
as it context-switches the PMU between tasks, and leaves it clear
when it finishes.
- oprofile setup, called for a new oprofile run, calls ppc_enable_pmcs,
which does nothing because it has already been called. In particular
it doesn't set the "PMU in use" flag.
This fixes the problem by arranging for ppc_enable_pmcs to always set
the "PMU in use" flag. It makes the perf_counter code call
ppc_enable_pmcs also rather than calling the lower-level function
directly, and removes the setting of the "PMU in use" flag from
pseries_lpar_enable_pmcs, since that is now done in its caller.
This also removes the declaration of pasemi_enable_pmcs because it
isn't defined anywhere.
Reported-by: Maynard Johnson <mpjohn@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: <stable@kernel.org)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Rusty's patch to change our sysfs access to various registers
to use smp_call_function_single() introduced a whole bunch of
warnings. This fixes them. This version also fixes an actual
bug in here where it did mtspr instead of mfspr when reading
the files
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: performance improvement
This fixes 'powerpc: avoid cpumask games in arch/powerpc/kernel/sysfs.c'
which talked about using smp_call_function_single, but actually used
work_on_cpu (an older version of the patch).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The current code for providing processor cache information in sysfs
has the following deficiencies:
- several complex functions that are hard to understand
- implicit recursion (cache_desc_release -> kobject_put -> cache_desc_release)
- explicit recursion (create_cache_index_info)
- use of two per-cpu arrays when one would suffice
- duplication of work on systems where CPUs share cache
Also, when I looked at implementing support for a shared_cpu_map
attribute, it was pretty much impossible to handle hotplug without
checking every single online CPU's cache_desc list and fixing things
up... not that this is a hot path, but it would have introduced
O(n^2)-ish behavior during boot. Addressing this involved rethinking
the core data structures used, which didn't lend itself to an
incremental approach.
This implementation maintains a "forest" (potentially more than one
tree) of cache objects which reflects the system's cache topology.
Cache objects are instantiated as needed as CPUs come online. A
per-cpu array is used mainly for sysfs-related bookkeeping; the
objects in the array just point to the appropriate points in the
forest.
This maintains compatibility with the existing code and includes some
enhancements:
- Implement the shared_cpu_map attribute, which is essential for
enabling userspace to discover the system's overall cache topology.
- Use cache-block-size properties if cache-line-size is not available.
I chose to place this implementation in a new file since it would have
roughly doubled the size of sysfs.c, which is already kind of messy.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Using the common code means that more complete cache information will
provided in sysfs on platforms that don't use the l2-cache property
convention.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
attr_smt_snooze_delay is only defined for CONFIG_PPC64, so protect the
attribute removal with the same condition. This fixes this build error
on 32-bit SMP configurations:
/data/home/miltonm/next.git/arch/powerpc/kernel/sysfs.c: In function ‘unregister_cpu_online’:
/data/home/miltonm/next.git/arch/powerpc/kernel/sysfs.c:722: error: ‘attr_smt_snooze_delay’ undeclared (first use in this function)
/data/home/miltonm/next.git/arch/powerpc/kernel/sysfs.c:722: error: (Each undeclared identifier is reported only once
/data/home/miltonm/next.git/arch/powerpc/kernel/sysfs.c:722: error: for each function it appears in.)
Signed-off-by: Paul Mackerras <paulus@samba.org>
arch/powerpc/kernel/sysfs.c:197:7: warning: "CONFIG_6xx" is not defined
arch/powerpc/kernel/sysfs.c:141: warning: 'run_on_cpu' defined but not used
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>