Merge branch 'akpm' (Andrew's patch-bomb)

Merge Andrew's second set of patches:
 - MM
 - a few random fixes
 - a couple of RTC leftovers

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
  rtc/rtc-88pm80x: remove unneed devm_kfree
  rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
  mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
  tmpfs: distribute interleave better across nodes
  mm: remove redundant initialization
  mm: warn if pg_data_t isn't initialized with zero
  mips: zero out pg_data_t when it's allocated
  memcg: gix memory accounting scalability in shrink_page_list
  mm/sparse: remove index_init_lock
  mm/sparse: more checks on mem_section number
  mm/sparse: optimize sparse_index_alloc
  memcg: add mem_cgroup_from_css() helper
  memcg: further prevent OOM with too many dirty pages
  memcg: prevent OOM with too many dirty pages
  mm: mmu_notifier: fix freed page still mapped in secondary MMU
  mm: memcg: only check anon swapin page charges for swap cache
  mm: memcg: only check swap cache pages for repeated charging
  mm: memcg: split swapin charge function into private and public part
  mm: memcg: remove needless !mm fixup to init_mm when charging
  mm: memcg: remove unneeded shmem charge type
  ...
This commit is contained in:
Linus Torvalds
2012-07-31 19:25:39 -07:00
131 changed files with 3173 additions and 1059 deletions
@@ -0,0 +1,5 @@
What: /proc/sys/vm/nr_pdflush_threads
Date: June 2012
Contact: Wanpeng Li <liwp@linux.vnet.ibm.com>
Description: Since pdflush is replaced by per-BDI flusher, the interface of old pdflush
exported in /proc/sys/vm/ should be removed.
+45
View File
@@ -0,0 +1,45 @@
HugeTLB Controller
-------------------
The HugeTLB controller allows to limit the HugeTLB usage per control group and
enforces the controller limit during page fault. Since HugeTLB doesn't
support page reclaim, enforcing the limit at page fault time implies that,
the application will get SIGBUS signal if it tries to access HugeTLB pages
beyond its limit. This requires the application to know beforehand how much
HugeTLB pages it would require for its use.
HugeTLB controller can be created by first mounting the cgroup filesystem.
# mount -t cgroup -o hugetlb none /sys/fs/cgroup
With the above step, the initial or the parent HugeTLB group becomes
visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
New groups can be created under the parent group /sys/fs/cgroup.
# cd /sys/fs/cgroup
# mkdir g1
# echo $$ > g1/tasks
The above steps create a new group g1 and move the current shell
process (bash) into it.
Brief summary of control files
hugetlb.<hugepagesize>.limit_in_bytes # set/show limit of "hugepagesize" hugetlb usage
hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded
hugetlb.<hugepagesize>.usage_in_bytes # show current res_counter usage for "hugepagesize" hugetlb
hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB limit
For a system supporting two hugepage size (16M and 16G) the control
files include:
hugetlb.16GB.limit_in_bytes
hugetlb.16GB.max_usage_in_bytes
hugetlb.16GB.usage_in_bytes
hugetlb.16GB.failcnt
hugetlb.16MB.limit_in_bytes
hugetlb.16MB.max_usage_in_bytes
hugetlb.16MB.usage_in_bytes
hugetlb.16MB.failcnt
+7 -5
View File
@@ -73,6 +73,8 @@ Brief summary of control files.
memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory
memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation
memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits
memory.kmem.tcp.max_usage_in_bytes # show max tcp buf memory usage recorded
1. History
@@ -187,12 +189,12 @@ the cgroup that brought it in -- this will happen on memory pressure).
But see section 8.2: when moving a task to another cgroup, its pages may
be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.
Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used.
Exception: If CONFIG_CGROUP_CGROUP_MEMCG_SWAP is not used.
When you do swapoff and make swapped-out pages of shmem(tmpfs) to
be backed into memory in force, charges for pages are accounted against the
caller of swapoff rather than the users of shmem.
2.4 Swap Extension (CONFIG_CGROUP_MEM_RES_CTLR_SWAP)
2.4 Swap Extension (CONFIG_MEMCG_SWAP)
Swap Extension allows you to record charge for swap. A swapped-in page is
charged back to original page allocator if possible.
@@ -259,7 +261,7 @@ When oom event notifier is registered, event will be delivered.
per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by
zone->lru_lock, it has no lock of its own.
2.7 Kernel Memory Extension (CONFIG_CGROUP_MEM_RES_CTLR_KMEM)
2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM)
With the Kernel memory extension, the Memory Controller is able to limit
the amount of kernel memory used by the system. Kernel memory is fundamentally
@@ -286,8 +288,8 @@ per cgroup, instead of globally.
a. Enable CONFIG_CGROUPS
b. Enable CONFIG_RESOURCE_COUNTERS
c. Enable CONFIG_CGROUP_MEM_RES_CTLR
d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension)
c. Enable CONFIG_MEMCG
d. Enable CONFIG_MEMCG_SWAP (to use swap extension)
1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
# mount -t tmpfs none /sys/fs/cgroup
@@ -13,6 +13,14 @@ Who: Jim Cromie <jim.cromie@gmail.com>, Jason Baron <jbaron@redhat.com>
---------------------------
What: /proc/sys/vm/nr_pdflush_threads
When: 2012
Why: Since pdflush is deprecated, the interface exported in /proc/sys/vm/
should be removed.
Who: Wanpeng Li <liwp@linux.vnet.ibm.com>
---------------------------
What: CONFIG_APM_CPU_IDLE, and its ability to call APM BIOS in idle
When: 2012
Why: This optional sub-feature of APM is of dubious reliability,
+13
View File
@@ -206,6 +206,8 @@ prototypes:
int (*launder_page)(struct page *);
int (*is_partially_uptodate)(struct page *, read_descriptor_t *, unsigned long);
int (*error_remove_page)(struct address_space *, struct page *);
int (*swap_activate)(struct file *);
int (*swap_deactivate)(struct file *);
locking rules:
All except set_page_dirty and freepage may block
@@ -229,6 +231,8 @@ migratepage: yes (both)
launder_page: yes
is_partially_uptodate: yes
error_remove_page: yes
swap_activate: no
swap_deactivate: no
->write_begin(), ->write_end(), ->sync_page() and ->readpage()
may be called from the request handler (/dev/loop).
@@ -330,6 +334,15 @@ cleaned, or an error value if not. Note that in order to prevent the page
getting mapped back in and redirtied, it needs to be kept locked
across the entire operation.
->swap_activate will be called with a non-zero argument on
files backing (non block device backed) swapfiles. A return value
of zero indicates success, in which case this file can be used for
backing swapspace. The swapspace operations will be proxied to the
address space operations.
->swap_deactivate() will be called in the sys_swapoff()
path after ->swap_activate() returned success.
----------------------- file_lock_operations ------------------------------
prototypes:
void (*fl_copy_lock)(struct file_lock *, struct file_lock *);
+12
View File
@@ -592,6 +592,8 @@ struct address_space_operations {
int (*migratepage) (struct page *, struct page *);
int (*launder_page) (struct page *);
int (*error_remove_page) (struct mapping *mapping, struct page *page);
int (*swap_activate)(struct file *);
int (*swap_deactivate)(struct file *);
};
writepage: called by the VM to write a dirty page to backing store.
@@ -760,6 +762,16 @@ struct address_space_operations {
Setting this implies you deal with pages going away under you,
unless you have them locked or reference counts increased.
swap_activate: Called when swapon is used on a file to allocate
space if necessary and pin the block lookup information in
memory. A return value of zero indicates success,
in which case this file can be used to back swapspace. The
swapspace operations will be proxied to this address space's
->swap_{out,in} methods.
swap_deactivate: Called during swapoff on files where swap_activate
was successful.
The File Object
===============
+14 -16
View File
@@ -42,7 +42,6 @@ Currently, these files are in /proc/sys/vm:
- mmap_min_addr
- nr_hugepages
- nr_overcommit_hugepages
- nr_pdflush_threads
- nr_trim_pages (only if CONFIG_MMU=n)
- numa_zonelist_order
- oom_dump_tasks
@@ -426,16 +425,6 @@ See Documentation/vm/hugetlbpage.txt
==============================================================
nr_pdflush_threads
The current number of pdflush threads. This value is read-only.
The value changes according to the number of dirty pages in the system.
When necessary, additional pdflush threads are created, one per second, up to
nr_pdflush_threads_max.
==============================================================
nr_trim_pages
This is available only on NOMMU kernels.
@@ -502,9 +491,10 @@ oom_dump_tasks
Enables a system-wide task dump (excluding kernel threads) to be
produced when the kernel performs an OOM-killing and includes such
information as pid, uid, tgid, vm size, rss, cpu, oom_adj score, and
name. This is helpful to determine why the OOM killer was invoked
and to identify the rogue task that caused it.
information as pid, uid, tgid, vm size, rss, nr_ptes, swapents,
oom_score_adj score, and name. This is helpful to determine why the
OOM killer was invoked, to identify the rogue task that caused it,
and to determine why the OOM killer chose the task it did to kill.
If this is set to zero, this information is suppressed. On very
large systems with thousands of tasks it may not be feasible to dump
@@ -574,16 +564,24 @@ of physical RAM. See above.
page-cluster
page-cluster controls the number of pages which are written to swap in
a single attempt. The swap I/O size.
page-cluster controls the number of pages up to which consecutive pages
are read in from swap in a single attempt. This is the swap counterpart
to page cache readahead.
The mentioned consecutivity is not in terms of virtual/physical addresses,
but consecutive on swap space - that means they were swapped out together.
It is a logarithmic value - setting it to zero means "1 page", setting
it to 1 means "2 pages", setting it to 2 means "4 pages", etc.
Zero disables swap readahead completely.
The default value is three (eight pages at a time). There may be some
small benefits in tuning this to a different value if your workload is
swap-intensive.
Lower values mean lower latencies for initial faults, but at the same time
extra faults and I/O delays for following faults if they would have been part of
that consecutive pages readahead would have brought in.
=============================================================
panic_on_oom
-1
View File
@@ -2353,7 +2353,6 @@ pfm_smpl_buffer_alloc(struct task_struct *task, struct file *filp, pfm_context_t
*/
insert_vm_struct(mm, vma);
mm->total_vm += size >> PAGE_SHIFT;
vm_stat_account(vma->vm_mm, vma->vm_flags, vma->vm_file,
vma_pages(vma));
up_write(&task->mm->mmap_sem);
+1
View File
@@ -401,6 +401,7 @@ static void __init node_mem_init(cnodeid_t node)
* Allocate the node data structures on the node first.
*/
__node_data[node] = __va(slot_freepfn << PAGE_SHIFT);
memset(__node_data[node], 0, PAGE_SIZE);
NODE_DATA(node)->bdata = &bootmem_node_data[node];
NODE_DATA(node)->node_start_pfn = start_pfn;
+2 -2
View File
@@ -21,8 +21,8 @@ CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y
CONFIG_CGROUP_MEMCG=y
CONFIG_CGROUP_MEMCG_SWAP=y
CONFIG_NAMESPACES=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
+1 -1
View File
@@ -16,7 +16,7 @@ CONFIG_CGROUPS=y
CONFIG_CPUSETS=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEMCG=y
CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y
CONFIG_CGROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
+1 -1
View File
@@ -11,7 +11,7 @@ CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEMCG=y
CONFIG_BLK_CGROUP=y
CONFIG_NAMESPACES=y
CONFIG_BLK_DEV_INITRD=y
+2 -2
View File
@@ -18,8 +18,8 @@ CONFIG_CPUSETS=y
# CONFIG_PROC_PID_CPUSET is not set
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y
CONFIG_CGROUP_MEMCG=y
CONFIG_CGROUP_MEMCG_SWAP=y
CONFIG_CGROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_BLK_CGROUP=y
+1 -1
View File
@@ -11,7 +11,7 @@ CONFIG_CGROUP_DEBUG=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEMCG=y
CONFIG_RELAY=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
+1 -1
View File
@@ -13,7 +13,7 @@ CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEMCG=y
CONFIG_RELAY=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
+2 -2
View File
@@ -15,8 +15,8 @@ CONFIG_CPUSETS=y
# CONFIG_PROC_PID_CPUSET is not set
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y
CONFIG_CGROUP_MEMCG=y
CONFIG_CGROUP_MEMCG_SWAP=y
CONFIG_CGROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_BLK_DEV_INITRD=y
+2 -2
View File
@@ -18,8 +18,8 @@ CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y
CONFIG_CGROUP_MEMCG=y
CONFIG_CGROUP_MEMCG_SWAP=y
CONFIG_CGROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_BLK_CGROUP=y
+2 -2
View File
@@ -17,8 +17,8 @@ CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y
CONFIG_CGROUP_MEMCG=y
CONFIG_CGROUP_MEMCG_SWAP=y
CONFIG_CGROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_BLK_CGROUP=y
+4 -4
View File
@@ -155,10 +155,10 @@ CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y
# CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED is not set
# CONFIG_CGROUP_MEM_RES_CTLR_KMEM is not set
CONFIG_CGROUP_MEMCG=y
CONFIG_CGROUP_MEMCG_SWAP=y
# CONFIG_CGROUP_MEMCG_SWAP_ENABLED is not set
# CONFIG_CGROUP_MEMCG_KMEM is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_CFS_BANDWIDTH is not set
+1
View File
@@ -7,6 +7,7 @@ config ZONE_DMA
config XTENSA
def_bool y
select HAVE_IDE
select GENERIC_ATOMIC64
select HAVE_GENERIC_HARDIRQS
select GENERIC_IRQ_SHOW
select GENERIC_CPU_DEVICES

Some files were not shown because too many files have changed in this diff Show More