In some cases kipmid can use a lot of CPU. This adds a way to tune the
CPU used by kipmid to help in those cases. By setting kipmid_max_busy_us
to a value between 100 and 500, it is possible to bring down kipmid CPU
load to practically 0 without loosing too much ipmi throughput
performance. Not setting the value, or setting the value to zero,
operation is unaffected.
Signed-off-by: Martin Wilck <martin.wilck@ts.fujitsu.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Presently, if panic_on_oom=2, the whole system panics even if the oom
happend in some special situation (as cpuset, mempolicy....). Then,
panic_on_oom=2 means painc_on_oom_always.
Now, memcg doesn't check panic_on_oom flag. This patch adds a check.
BTW, how it's useful ?
kdump+panic_on_oom=2 is the last tool to investigate what happens in
oom-ed system. When a task is killed, the sysytem recovers and there will
be few hint to know what happnes. In mission critical system, oom should
never happen. Then, panic_on_oom=2+kdump is useful to avoid next OOM by
knowing precise information via snapshot.
TODO:
- For memcg, it's for isolate system's memory usage, oom-notiifer and
freeze_at_oom (or rest_at_oom) should be implemented. Then, management
daemon can do similar jobs (as kdump) or taking snapshot per cgroup.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It allows to register multiple memory and memsw thresholds and gets
notifications when it crosses.
To register a threshold application need:
- create an eventfd;
- open memory.usage_in_bytes or memory.memsw.usage_in_bytes;
- write string like "<event_fd> <memory.usage_in_bytes> <threshold>" to
cgroup.event_control.
Application will be notified through eventfd when memory usage crosses
threshold in any direction.
It's applicable for root and non-root cgroup.
It uses stats to track memory usage, simmilar to soft limits. It checks
if we need to send event to userspace on every 100 page in/out. I guess
it's good compromise between performance and accuracy of thresholds.
[akpm@linux-foundation.org: coding-style fixes]
[nishimura@mxp.nes.nec.co.jp: fix documentation merge issue]
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Dan Malek <dan@embeddedalley.com>
Cc: Vladislav Buzov <vbuzov@embeddedalley.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Alexander Shishkin <virtuoso@slind.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset introduces eventfd-based API for notifications in cgroups
and implements memory notifications on top of it.
It uses statistics in memory controler to track memory usage.
Output of time(1) on building kernel on tmpfs:
Root cgroup before changes:
make -j2 506.37 user 60.93s system 193% cpu 4:52.77 total
Non-root cgroup before changes:
make -j2 507.14 user 62.66s system 193% cpu 4:54.74 total
Root cgroup after changes (0 thresholds):
make -j2 507.13 user 62.20s system 193% cpu 4:53.55 total
Non-root cgroup after changes (0 thresholds):
make -j2 507.70 user 64.20s system 193% cpu 4:55.70 total
Root cgroup after changes (1 thresholds, never crossed):
make -j2 506.97 user 62.20s system 193% cpu 4:53.90 total
Non-root cgroup after changes (1 thresholds, never crossed):
make -j2 507.55 user 64.08s system 193% cpu 4:55.63 total
This patch:
Introduce the write-only file "cgroup.event_control" in every cgroup.
To register new notification handler you need:
- create an eventfd;
- open a control file to be monitored. Callbacks register_event() and
unregister_event() must be defined for the control file;
- write "<event_fd> <control_fd> <args>" to cgroup.event_control.
Interpretation of args is defined by control file implementation;
eventfd will be woken up by control file implementation or when the
cgroup is removed.
To unregister notification handler just close eventfd.
If you need notification functionality for a control file you have to
implement callbacks register_event() and unregister_event() in the
struct cftype.
[kamezawa.hiroyu@jp.fujitsu.com: Kconfig fix]
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Dan Malek <dan@embeddedalley.com>
Cc: Vladislav Buzov <vbuzov@embeddedalley.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Alexander Shishkin <virtuoso@slind.org>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is another core part of this move-charge-at-task-migration
feature. It enables moving charges of anonymous swaps.
To move the charge of swap, we need to exchange swap_cgroup's record.
In current implementation, swap_cgroup's record is protected by:
- page lock: if the entry is on swap cache.
- swap_lock: if the entry is not on swap cache.
This works well in usual swap-in/out activity.
But this behavior make the feature of moving swap charge check many
conditions to exchange swap_cgroup's record safely.
So I changed modification of swap_cgroup's recored(swap_cgroup_record())
to use xchg, and define a new function to cmpxchg swap_cgroup's record.
This patch also enables moving charge of non pte_present but not uncharged
swap caches, which can be exist on swap-out path, by getting the target
pages via find_get_page() as do_mincore() does.
[kosaki.motohiro@jp.fujitsu.com: fix ia64 build]
[akpm@linux-foundation.org: fix typos]
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In current memcg, charges associated with a task aren't moved to the new
cgroup at task migration. Some users feel this behavior to be strange.
These patches are for this feature, that is, for charging to the new
cgroup and, of course, uncharging from the old cgroup at task migration.
This patch adds "memory.move_charge_at_immigrate" file, which is a flag
file to determine whether charges should be moved to the new cgroup at
task migration or not and what type of charges should be moved. This
patch also adds read and write handlers of the file.
This patch also adds no-op handlers for this feature. These handlers will
be implemented in later patches. And you cannot write any values other
than 0 to move_charge_at_immigrate yet.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provides support for unloading modular subsystems.
This patch adds a new function cgroup_unload_subsys which is to be used
for removing a loaded subsystem during module deletion. Reference
counting of the subsystems' modules is moved from once (at load time) to
once per attached hierarchy (in parse_cgroupfs_options and
rebind_subsystems) (i.e., 0 or 1).
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add interface between cgroups subsystem management and module loading
This patch implements rudimentary module-loading support for cgroups -
namely, a cgroup_load_subsys (similar to cgroup_init_subsys) for use as a
module initcall, and a struct module pointer in struct cgroup_subsys.
Several functions that might be wanted by modules have had EXPORT_SYMBOL
added to them, but it's unclear exactly which functions want it and which
won't.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gmail web gui does not work for sending patches now even with firefox
"view source with" extension. It will use windows style line breaks to
wrap lines automatically when sening email.
Rewrite the gmail web gui part of email client documentation.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Martin Bligh <mbligh@mbligh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add header file requirements. Compliments of Stephen Rothwell.
Stephen calls this Rule #1, so I put it there, but I didn't want to demote
any of the others in the list, so I made one of them number 2b.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Documentation/vm/:
Expose example and tool source files in the Documentation/ directory in
their own files instead of being buried (almost hidden) in readme/txt files.
This should help to prevent bitrot.
This will make them more visible/usable to users who may need
to use them, to developers who may need to test with them, and
to anyone who would fix/update them if they were more visible.
Also, if any of these possibly should not be in the kernel tree at
all, it will be clearer that they are here and we can discuss if
they should be removed.
Also build the recently-added map_hugetlb.c.
Make several functions static to prevent linker warnings.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Eric B Munson <ebmunson@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Documentation/laptops/laptop-mode.txt:
Expose example and tool source files in the Documentation/ directory in
their own files instead of being buried (almost hidden) in readme/txt files.
This should help to prevent bitrot.
This will make them more visible/usable to users who may need
to use them, to developers who may need to test with them, and
to anyone who would fix/update them if they were more visible.
Also, if any of these possibly should not be in the kernel tree at
all, it will be clearer that they are here and we can discuss if
they should be removed.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Documentation/timers/hpet.txt:
Expose example and tool source files in the Documentation/timers/ directory in
their own files instead of being buried (almost hidden) in readme/txt files.
This should help to prevent bitrot.
This will make them more visible/usable to users who may need
to use them, to developers who may need to test with them, and
to anyone who would fix/update them if they were more visible.
Also, if any of these possibly should not be in the kernel tree at
all, it will be clearer that they are here and we can discuss if
they should be removed.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (62 commits)
msi-laptop: depends on RFKILL
msi-laptop: Detect 3G device exists by standard ec command
msi-laptop: Add resume method for set the SCM load again
msi-laptop: Support some MSI 3G netbook that is need load SCM
msi-laptop: Add threeg sysfs file for support query 3G state by standard 66/62 ec command
msi-laptop: Support standard ec 66/62 command on MSI notebook and nebook
Driver core: create lock/unlock functions for struct device
sysfs: fix for thinko with sysfs_bin_attr_init()
sysfs: Kill unused sysfs_sb variable.
sysfs: Pass super_block to sysfs_get_inode
driver core: Use sysfs_rename_link in device_rename
sysfs: Implement sysfs_rename_link
sysfs: Pack sysfs_dirent more tightly.
sysfs: Serialize updates to the vfs inode
sysfs: windfarm: init sysfs attributes
sysfs: Use sysfs_attr_init and sysfs_bin_attr_init on module dynamic attributes
sysfs: Document sysfs_attr_init and sysfs_bin_attr_init
sysfs: Use sysfs_attr_init and sysfs_bin_attr_init on dynamic attributes
sysfs: Use one lockdep class per sysfs attribute.
sysfs: Only take active references on attributes.
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (26 commits)
ALSA: hdmi - show debug message on changing audio infoframe
ALSA: hdmi - merge common code for intelhdmi and nvhdmi
ALSA: hda - Add ASRock mobo to MSI blacklist
ALSA: hda: uninitialized variable fix
ALSA: hda: Use LPIB for a Biostar Microtech board
ALSA: usb/audio.h: Fix field order
ALSA: fix jazz16 compile (udelay)
ALSA: hda: Use LPIB for Dell Latitude 131L
ALSA: hda - Build hda_eld into snd-hda-codec module
ALSA: hda - Support NVIDIA MCP89 and GT21x hdmi audio
ALSA: hda - Support max codecs to 8 for nvidia hda controller
ALSA: riptide: clean up while loop
ALSA: usbaudio - remove debug "SAMPLE BYTES" printk line
ALSA: timer - pass real event in snd_timer_notify1() to instance callback
ALSA: oxygen: change || to &&
ALSA: opti92x: use PnP data to select Master Control port
ASoC: fix ak4104 register array access
ASoC: soc_pcm_open: Add missing bailout tag
ALSA: usbaudio: Fix wrong bitrate for Creative Creative VF0470 Live Cam
ALSA: ua101: removing debugging code
...
Constify struct sysfs_ops.
This is part of the ops structure constification
effort started by Arjan van de Ven et al.
Benefits of this constification:
* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime
* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime
* potentially better optimized code as the compiler
can assume that the const data cannot be changed
* the compiler/linker move const data into .rodata
and therefore exclude them from false sharing
Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>