Commit Graph

82 Commits

Author SHA1 Message Date
Seth Jennings cb5e39b803 drivers: base: refactor add_memory_section() to add_memory_block()
Right now memory_dev_init() maintains the memory block pointer
between iterations of add_memory_section().  This is nasty.

This patch refactors add_memory_section() to become add_memory_block().
The refactoring pulls the section scanning out of memory_dev_init()
and simplifies the signature.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 11:49:47 -07:00
Seth Jennings 37171e3cb7 drivers: base: remove improper get/put in add_memory_section()
The path through add_memory_section() when the memory block already
exists uses flawed refcounting logic.  A get_device() is done on a
memory block using a pointer that might not be valid as we dropped
our previous reference and didn't obtain a new reference in the
proper way.

Lets stop pretending and just remove the get/put.  The
mem_sysfs_mutex, which we hold over the entire init loop now, will
prevent the memory blocks from disappearing from under us.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 11:49:47 -07:00
Seth Jennings 37a7bd6255 drivers: base: reduce add_memory_section() for boot-time only
Now that add_memory_section() is only called from boot time, reduce
the logic and remove the enum.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 11:48:41 -07:00
Seth Jennings d7f80530ad drivers: base: unshare add_memory_section() from hotplug
add_memory_section() is currently called from both boot time and run
time via hotplug and there is a lot of nastiness to allow for shared
code including an enum parameter to convey the calling context to
add_memory_section().

This patch is the first step in breaking up the messy code sharing by
pulling the hotplug path for add_memory_section() directly into
register_new_memory().

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 11:48:40 -07:00
Seth Jennings df2b717c66 drivers: base: use device get/put functions
Use the [get|put]_device functions for ref'ing the memory block device
rather than the kobject functions which should be hidden away by the
device layer.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 11:48:40 -07:00
Seth Jennings 879f1bec8e drivers: base: remove unneeded variable
The error variable is not needed.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 11:47:39 -07:00
Seth Jennings b1eaef3da5 drivers: base: move mutex lock out of add_memory_section()
There is no point in releasing the mutex for each section that is added
during boot time.  Just hold it over the entire initialization loop.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-21 11:47:39 -07:00
Jingoo Han 34da5e6770 driver core: replace strict_strto*() with kstrto*()
The usage of strict_strto*() is not preferred, because
strict_strto*() is obsolete. Thus, kstrto*() should be
used.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-26 18:02:43 -07:00
Linus Torvalds f991fae5c6 Merge tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
 "This time the total number of ACPI commits is slightly greater than
  the number of cpufreq commits, but Viresh Kumar (who works on cpufreq)
  remains the most active patch submitter.

  To me, the most significant change is the addition of offline/online
  device operations to the driver core (with the Greg's blessing) and
  the related modifications of the ACPI core hotplug code.  Next are the
  freezer updates from Colin Cross that should make the freezing of
  tasks a bit less heavy weight.

  We also have a couple of regression fixes, a number of fixes for
  issues that have not been identified as regressions, two new drivers
  and a bunch of cleanups all over.

  Highlights:

   - Hotplug changes to support graceful hot-removal failures.

     It sometimes is necessary to fail device hot-removal operations
     gracefully if they cannot be carried out completely.  For example,
     if memory from a memory module being hot-removed has been allocated
     for the kernel's own use and cannot be moved elsewhere, it's
     desirable to fail the hot-removal operation in a graceful way
     rather than to crash the kernel, but currenty a success or a kernel
     crash are the only possible outcomes of an attempted memory
     hot-removal.  Needless to say, that is not a very attractive
     alternative and it had to be addressed.

     However, in order to make it work for memory, I first had to make
     it work for CPUs and for this purpose I needed to modify the ACPI
     processor driver.  It's been split into two parts, a resident one
     handling the low-level initialization/cleanup and a modular one
     playing the actual driver's role (but it binds to the CPU system
     device objects rather than to the ACPI device objects representing
     processors).  That's been sort of like a live brain surgery on a
     patient who's riding a bike.

     So this is a little scary, but since we found and fixed a couple of
     regressions it caused to happen during the early linux-next testing
     (a month ago), nobody has complained.

     As a bonus we remove some duplicated ACPI hotplug code, because the
     ACPI-based CPU hotplug is now going to use the common ACPI hotplug
     code.

   - Lighter weight freezing of tasks.

     These changes from Colin Cross and Mandeep Singh Baines are
     targeted at making the freezing of tasks a bit less heavy weight
     operation.  They reduce the number of tasks woken up every time
     during the freezing, by using the observation that the freezer
     simply doesn't need to wake up some of them and wait for them all
     to call refrigerator().  The time needed for the freezer to decide
     to report a failure is reduced too.

     Also reintroduced is the check causing a lockdep warining to
     trigger when try_to_freeze() is called with locks held (which is
     generally unsafe and shouldn't happen).

   - cpufreq updates

     First off, a commit from Srivatsa S Bhat fixes a resume regression
     introduced during the 3.10 cycle causing some cpufreq sysfs
     attributes to return wrong values to user space after resume.  The
     fix is kind of fresh, but also it's pretty obvious once Srivatsa
     has identified the root cause.

     Second, we have a new freqdomain_cpus sysfs attribute for the
     acpi-cpufreq driver to provide information previously available via
     related_cpus.  From Lan Tianyu.

     Finally, we fix a number of issues, mostly related to the
     CPUFREQ_POSTCHANGE notifier and cpufreq Kconfig options and clean
     up some code.  The majority of changes from Viresh Kumar with bits
     from Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia,
     Arnd Bergmann, and Tang Yuantian.

   - ACPICA update

     A usual bunch of updates from the ACPICA upstream.

     During the 3.4 cycle we introduced support for ACPI 5 extended
     sleep registers, but they are only supposed to be used if the
     HW-reduced mode bit is set in the FADT flags and the code attempted
     to use them without checking that bit.  That caused suspend/resume
     regressions to happen on some systems.  Fix from Lv Zheng causes
     those registers to be used only if the HW-reduced mode bit is set.

     Apart from this some other ACPICA bugs are fixed and code cleanups
     are made by Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and
     Zhang Rui.

   - cpuidle updates

     New driver for Xilinx Zynq processors is added by Michal Simek.

     Multidriver support simplification, addition of some missing
     kerneldoc comments and Kconfig-related fixes come from Daniel
     Lezcano.

   - ACPI power management updates

     Changes to make suspend/resume work correctly in Xen guests from
     Konrad Rzeszutek Wilk, sparse warning fix from Fengguang Wu and
     cleanups and fixes of the ACPI device power state selection
     routine.

   - ACPI documentation updates

     Some previously missing pieces of ACPI documentation are added by
     Lv Zheng and Aaron Lu (hopefully, that will help people to
     uderstand how the ACPI subsystem works) and one outdated doc is
     updated by Hanjun Guo.

   - Assorted ACPI updates

     We finally nailed down the IA-64 issue that was the reason for
     reverting commit 9f29ab11dd ("ACPI / scan: do not match drivers
     against objects having scan handlers"), so we can fix it and move
     the ACPI scan handler check added to the ACPI video driver back to
     the core.

     A mechanism for adding CMOS RTC address space handlers is
     introduced by Lan Tianyu to allow some EC-related breakage to be
     fixed on some systems.

     A spec-compliant implementation of acpi_os_get_timer() is added by
     Mika Westerberg.

     The evaluation of _STA is added to do_acpi_find_child() to avoid
     situations in which a pointer to a disabled device object is
     returned instead of an enabled one with the same _ADR value.  From
     Jeff Wu.

     Intel BayTrail PCH (Platform Controller Hub) support is added to
     the ACPI driver for Intel Low-Power Subsystems (LPSS) and that
     driver is modified to work around a couple of known BIOS issues.
     Changes from Mika Westerberg and Heikki Krogerus.

     The EC driver is fixed by Vasiliy Kulikov to use get_user() and
     put_user() instead of dereferencing user space pointers blindly.

     Code cleanups are made by Bjorn Helgaas, Nicholas Mazzuca and Toshi
     Kani.

   - Assorted power management updates

     The "runtime idle" helper routine is changed to take the return
     values of the callbacks executed by it into account and to call
     rpm_suspend() if they return 0, which allows us to reduce the
     overall code bloat a bit (by dropping some code that's not
     necessary any more after that modification).

     The runtime PM documentation is updated by Alan Stern (to reflect
     the "runtime idle" behavior change).

     New trace points for PM QoS are added by Sahara
     (<keun-o.park@windriver.com>).

     PM QoS documentation is updated by Lan Tianyu.

     Code cleanups are made and minor issues are addressed by Bernie
     Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan.

   - devfreq updates

     New driver for the Exynos5-bus device from Abhilash Kesavan.

     Minor cleanups, fixes and MAINTAINERS update from MyungJoo Ham,
     Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun.

   - OMAP power management updates

     Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver
     updates from Andrii Tseglytskyi and Nishanth Menon."

* tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits)
  cpufreq: Fix cpufreq regression after suspend/resume
  ACPI / PM: Fix possible NULL pointer deref in acpi_pm_device_sleep_state()
  PM / Sleep: Warn about system time after resume with pm_trace
  cpufreq: don't leave stale policy pointer in cdbs->cur_policy
  acpi-cpufreq: Add new sysfs attribute freqdomain_cpus
  cpufreq: make sure frequency transitions are serialized
  ACPI: implement acpi_os_get_timer() according the spec
  ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan
  ACPI: Add CMOS RTC Operation Region handler support
  ACPI / processor: Drop unused variable from processor_perflib.c
  cpufreq: tegra: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: s3c64xx: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: omap: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: imx6q: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: exynos: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: dbx500: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: davinci: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: arm-big-little: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: powernow-k8: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: pcc: call CPUFREQ_POSTCHANGE notfier in error cases
  ...
2013-07-03 14:35:40 -07:00
Nathan Fontenot 96b2c0fc8e drivers/base: Use attribute groups to create sysfs memory files
Update the sysfs memory code to create/delete files at the time of device
and subsystem registration.

The current code creates files in the root memory directory explicitly through
the use of init_* routines. The files for each memory block are created and
deleted explicitly using the mem_[create|delete]_simple_file macros.

This patch creates attribute groups for the memory root files and files in
each memory block directory so that they are created and deleted implicitly
at subsys and device register and unregister time.

This did necessitate moving the register_memory() updating it to set the
dev.groups field.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-06 12:38:11 -07:00
Rafael J. Wysocki ea50be5934 Driver core / MM: Drop offline_memory_block()
Since offline_memory_block(mem) is functionally equivalent to
device_offline(&mem->dev), make the only caller of the former use
the latter instead and drop offline_memory_block() entirely.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Toshi Kani <toshi.kani@hp.com>
2013-06-01 21:37:10 +02:00
Rafael J. Wysocki b2c064b25a Driver core / memory: Simplify __memory_block_change_state()
As noted by Tang Chen, the last_online field in struct memory_block
introduced by commit 4960e05 (Driver core: Introduce offline/online
callbacks for memory blocks) is not really necessary, because
online_pages() restores the previous state if passed ONLINE_KEEP as
the last argument.  Therefore, remove that field along with the code
referring to it.

References: http://marc.info/?l=linux-kernel&m=136919777305599&w=2
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
2013-06-01 21:37:09 +02:00
Rafael J. Wysocki 4960e05e22 Driver core: Introduce offline/online callbacks for memory blocks
Introduce .offline() and .online() callbacks for memory_subsys
that will allow the generic device_offline() and device_online()
to be used with device objects representing memory blocks.  That,
in turn, allows the ACPI subsystem to use device_offline() to put
removable memory blocks offline, if possible, before removing
memory modules holding them.

The 'online' sysfs attribute of memory block devices will attempt to
put them offline if 0 is written to it and will attempt to apply the
previously used online type when onlining them (i.e. when 1 is
written to it).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
2013-05-12 14:14:45 +02:00
Tang Chen 6056d619a8 mm: Remove unused parameter of pages_correctly_reserved()
nr_pages is not used in pages_correctly_reserved().
So remove it.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:38 -07:00
David Rientjes 4edd7ceff0 mm, hotplug: avoid compiling memory hotremove functions when disabled
__remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE.  PowerPC
pseries will return -EOPNOTSUPP if unsupported.

Adding an #ifdef causes several other functions it depends on to also
become unnecessary, which saves in .text when disabled (it's disabled in
most defconfigs besides powerpc, including x86).  remove_memory_block()
becomes static since it is not referenced outside of
drivers/base/memory.c.

Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled
and disabled.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:37 -07:00
Yasuaki Ishimatsu 6677e3eaf4 memory-hotplug: check whether all memory blocks are offlined or not when removing memory
We remove the memory like this:

 1. lock memory hotplug
 2. offline a memory block
 3. unlock memory hotplug
 4. repeat 1-3 to offline all memory blocks
 5. lock memory hotplug
 6. remove memory(TODO)
 7. unlock memory hotplug

All memory blocks must be offlined before removing memory.  But we don't
hold the lock in the whole operation.  So we should check whether all
memory blocks are offlined before step6.  Otherwise, kernel maybe
panicked.

Offlining a memory block and removing a memory device can be two
different operations.  Users can just offline some memory blocks without
removing the memory device.  For this purpose, the kernel has held
lock_memory_hotplug() in __offline_pages().  To reuse the code for
memory hot-remove, we repeat step 1-3 to offline all the memory blocks,
repeatedly lock and unlock memory hotplug, but not hold the memory
hotplug lock in the whole operation.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:11 -08:00
Felipe Balbi 74fef7a8fd base: memory: fix soft/hard_offline_page permissions
those two sysfs files don't have a 'show' method,
so they shouldn't have a read permission. Thanks
to Greg Kroah-Hartman for actually looking into
the source code and figuring out we had a real bug
with these two files.

Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-18 11:18:13 -08:00
Lai Jiangshan 511c2aba8f mm, memory-hotplug: dynamic configure movable memory and portion memory
Add online_movable and online_kernel for logic memory hotplug.  This is
the dynamic version of "movablecore" & "kernelcore".

We have the same reason to introduce it as to introduce "movablecore" &
"kernelcore".  It has the same motive as "movablecore" & "kernelcore", but
it is dynamic/running-time:

o We can configure memory as kernelcore or movablecore after boot.

  Userspace workload is increased, we need more hugepage, we can't use
  "online_movable" to add memory and allow the system use more
  THP(transparent-huge-page), vice-verse when kernel workload is increase.

  Also help for virtualization to dynamic configure host/guest's memory,
  to save/(reduce waste) memory.

  Memory capacity on Demand

o When a new node is physically online after boot, we need to use
  "online_movable" or "online_kernel" to configure/portion it as we
  expected when we logic-online it.

  This configuration also helps for physically-memory-migrate.

o all benefit as the same as existed "movablecore" & "kernelcore".

o Preparing for movable-node, which is very important for power-saving,
  hardware partitioning and high-available-system(hardware fault
  management).

(Note, we don't introduce movable-node here.)

Action behavior:
When a memoryblock/memorysection is onlined by "online_movable", the kernel
will not have directly reference to the page of the memoryblock,
thus we can remove that memory any time when needed.

When it is online by "online_kernel", the kernel can use it.
When it is online by "online", the zone type doesn't changed.

Current constraints:
Only the memoryblock which is adjacent to the ZONE_MOVABLE
can be online from ZONE_NORMAL to ZONE_MOVABLE.

[akpm@linux-foundation.org: use min_t, cleanups]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:28 -08:00
Yasuaki Ishimatsu fa7194eb99 memory hotplug: suppress "Device memoryX does not have a release() function" warning
When calling remove_memory_block(), the function shows following message
at device_release().

"Device 'memory528' does not have a release() function, it is broken and
must be fixed."

The reason is memory_block's device struct does not have a release()
function.

So the patch registers memory_block_release() to the device's release()
function for suppressing the warning message.  Additionally, the patch
moves kfree(mem) into the release function since the release function is
prepared as a means to free a memory_block struct.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:22 -08:00
Wen Congyang e90bdb7f52 memory-hotplug: update memory block's state and notify userspace
remove_memory() will be called when hot removing a memory device.  But
even if offlining memory, we cannot notice it.  So the patch updates the
memory block's state and sends notification to userspace.

Additionally, the memory device may contain more than one memory block.
If the memory block has been offlined, __offline_pages() will fail.  So we
should try to offline one memory block at a time.

Thus remove_memory() also check each memory block's state.  So there is no
need to check the memory block's state before calling remove_memory().

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:23:02 +09:00
Wen Congyang a16cee10c7 memory-hotplug: preparation to notify memory block's state at memory hot remove
remove_memory() is called in two cases:
1. echo offline >/sys/devices/system/memory/memoryXX/state
2. hot remove a memory device

In the 1st case, the memory block's state is changed and the notification
that memory block's state changed is sent to userland after calling
remove_memory().  So user can notice memory block is changed.

But in the 2nd case, the memory block's state is not changed and the
notification is not also sent to userspcae even if calling
remove_memory().  So user cannot notice memory block is changed.

For adding the notification at memory hot remove, the patch just prepare
as follows:
1st case uses offline_pages() for offlining memory.
2nd case uses remove_memory() for offlining memory and changing memory block's
    state and notifing the information.

The patch does not implement notification to remove_memory().

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:23:02 +09:00
Ingo Molnar cd593accdc Merge tag 'v3.3-rc7' into x86/mce
Merge reason: Update from an ancient -rc1 base to an almost-final stable kernel.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-14 07:44:11 +01:00
Yinghai Lu 321bf4ed5f drivers/base/memory.c: fix memory_dev_init() long delay
One system with 2048g ram, reported soft lockup on recent kernel.

[   34.426749] cpu_dev_init done
[   61.166399] BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[   61.166733] Modules linked in:
[   61.166904] irq event stamp: 1935610
[   61.178431] hardirqs last  enabled at (1935609): [<ffffffff81ce8c05>] mutex_lock_nested+0x299/0x2b4
[   61.178923] hardirqs last disabled at (1935610): [<ffffffff81cf2bab>] apic_timer_interrupt+0x6b/0x80
[   61.198767] softirqs last  enabled at (1935476): [<ffffffff8106e59c>] __do_softirq+0x195/0x1ab
[   61.218604] softirqs last disabled at (1935471): [<ffffffff81cf359c>] call_softirq+0x1c/0x30
[   61.238408] CPU 0
[   61.238549] Modules linked in:
[   61.238744]
[   61.238825] Pid: 1, comm: swapper/0 Not tainted 3.3.0-rc1-tip-yh-02076-g962f689-dirty #171
[   61.278212] RIP: 0010:[<ffffffff810b3e3a>]  [<ffffffff810b3e3a>] lock_release+0x90/0x9c
[   61.278627] RSP: 0018:ffff883f64dbfd70  EFLAGS: 00000246
[   61.298287] RAX: ffff883f64dc0000 RBX: 0000000000000000 RCX: 000000000000008b
[   61.298690] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[   61.318383] RBP: ffff883f64dbfda0 R08: 0000000000000001 R09: 000000000000008b
[   61.338215] R10: 0000000000000000 R11: 0000000000000000 R12: ffff883f64dbfd10
[   61.338610] R13: ffff883f64dc0708 R14: ffff883f64dc0708 R15: ffffffff81095657
[   61.358299] FS:  0000000000000000(0000) GS:ffff883f7d600000(0000) knlGS:0000000000000000
[   61.378118] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   61.378450] CR2: 0000000000000000 CR3: 00000000024af000 CR4: 00000000000007f0
[   61.398144] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   61.417918] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   61.418260] Process swapper/0 (pid: 1, threadinfo ffff883f64dbe000, task ffff883f64dc0000)
[   61.445358] Stack:
[   61.445511]  0000000000000002 ffff897f649ba168 ffff883f64dbfe10 ffff88ff64bb57a8
[   61.458040]  0000000000000000 0000000000000000 ffff883f64dbfdc0 ffffffff81ceb1b4
[   61.458491]  000000000011608c ffff88ff64bb58a8 ffff883f64dbfdf0 ffffffff81c57638
[   61.478215] Call Trace:
[   61.478367]  [<ffffffff81ceb1b4>] _raw_spin_unlock+0x21/0x2e
[   61.497994]  [<ffffffff81c57638>] klist_next+0x9e/0xbc
[   61.498264]  [<ffffffff8148ba99>] next_device+0xe/0x1e
[   61.517867]  [<ffffffff8148c0cc>] subsys_find_device_by_id+0xb7/0xd6
[   61.518197]  [<ffffffff81498846>] find_memory_block_hinted+0x3d/0x66
[   61.537927]  [<ffffffff8149887f>] find_memory_block+0x10/0x12
[   61.538193]  [<ffffffff814988b6>] add_memory_section+0x35/0x9e
[   61.557932]  [<ffffffff827fecef>] memory_dev_init+0x68/0xda
[   61.558227]  [<ffffffff827fec01>] driver_init+0x97/0xa7
[   61.577853]  [<ffffffff827cdf3c>] kernel_init+0xf6/0x1c0
[   61.578140]  [<ffffffff81cf34a4>] kernel_thread_helper+0x4/0x10
[   61.597850]  [<ffffffff81ceb59d>] ? retint_restore_args+0xe/0xe
[   61.598144]  [<ffffffff827cde46>] ? start_kernel+0x3ab/0x3ab
[   61.617826]  [<ffffffff81cf34a0>] ? gs_change+0xb/0xb
[   61.618060] Code: 10 48 83 3b 00 eb e8 4c 89 f2 44 89 fe 4c 89 ef e8 e1 fe ff ff 65 48 8b 04 25 40 bc 00 00 c7 80 cc 06 00 00 00 00 00 00 41 54 9d <5e> 5b 41 5c 41 5d 41 5e 41 5f 5d c3 55 48 89 e5 41 57 41 89 cf
[   89.285380] memory_dev_init done

Finally it takes about 55s to create 16400 memory entries.

Root cause: for x86_64, 2048g (with 2g hole at [2g,4g), and TOP2 will be 2050g), will have 16400 memory block.

find_memory_block/subsys_find_device_by_id will be expensive with that many entries.

Actually, we don't need to find that memory block for BOOT path.

Skip that finding make it get back to normal.

[   34.466696] cpu_dev_init done
[   35.290080] memory_dev_init done

Also solved the delay with topology_init when sections_per_block is not 1.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Nathan Fontenot <nfont@austin.ibm.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-02 15:50:05 -08:00
Ingo Molnar 4e9f44ba29 Merge tag 'mce-recovery-for-tip' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/mce
Implement MCE recovery for the data load error path and assorted cleanups.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-26 11:40:13 +01:00
Michael Holzheu f5138e4221 kdump: add udev events for memory online/offline
Currently no udev events for memory hotplug "online" and "offline" are
generated:

  # udevadm monitor
  # echo offline > /sys/devices/system/memory/memory4/state
  ==> No event

When kdump is loaded, kexec detects the current memory configuration and
stores it in the pre-allocated ELF core header.  Therefore, for kdump it
is necessary to reload the kdump kernel with kexec when the memory
configuration changes (e.g.  for online/offline hotplug memory).

In order to do this automatically, udev rules should be used.  This kernel
patch adds udev events for "online" and "offline".  Together with this
kernel patch, the following udev rules for online/offline have to be added
to "/etc/udev/rules.d/98-kexec.rules":

  SUBSYSTEM=="memory", ACTION=="online", PROGRAM="/etc/init.d/kdump restart"
  SUBSYSTEM=="memory", ACTION=="offline", PROGRAM="/etc/init.d/kdump restart"

[sfr@canb.auug.org.au: fixups for class to subsystem conversion]
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:11 -08:00