Merge branch 'akpm' (patches from Andrew)

Merge second patch-bomb from Andrew Morton:
 "Almost all of the rest of MM.  There was an unusually large amount of
  MM material this time"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (141 commits)
  zpool: remove no-op module init/exit
  mm: zbud: constify the zbud_ops
  mm: zpool: constify the zpool_ops
  mm: swap: zswap: maybe_preload & refactoring
  zram: unify error reporting
  zsmalloc: remove null check from destroy_handle_cache()
  zsmalloc: do not take class lock in zs_shrinker_count()
  zsmalloc: use class->pages_per_zspage
  zsmalloc: consider ZS_ALMOST_FULL as migrate source
  zsmalloc: partial page ordering within a fullness_list
  zsmalloc: use shrinker to trigger auto-compaction
  zsmalloc: account the number of compacted pages
  zsmalloc/zram: introduce zs_pool_stats api
  zsmalloc: cosmetic compaction code adjustments
  zsmalloc: introduce zs_can_compact() function
  zsmalloc: always keep per-class stats
  zsmalloc: drop unused variable `nr_to_migrate'
  mm/memblock.c: fix comment in __next_mem_range()
  mm/page_alloc.c: fix type information of memoryless node
  memory-hotplug: fix comments in zone_spanned_pages_in_node() and zone_spanned_pages_in_node()
  ...
This commit is contained in:
Linus Torvalds
2015-09-08 17:52:23 -07:00
99 changed files with 2784 additions and 1569 deletions
+7
View File
@@ -104,6 +104,13 @@ crossing restrictions, pass 0 for alloc; passing 4096 says memory allocated
from this pool must not cross 4KByte boundaries.
void *dma_pool_zalloc(struct dma_pool *pool, gfp_t mem_flags,
dma_addr_t *handle)
Wraps dma_pool_alloc() and also zeroes the returned memory if the
allocation attempt succeeded.
void *dma_pool_alloc(struct dma_pool *pool, gfp_t gfp_flags,
dma_addr_t *dma_handle);
+2 -1
View File
@@ -144,7 +144,8 @@ mem_used_max RW the maximum amount memory zram have consumed to
store compressed data
mem_limit RW the maximum amount of memory ZRAM can use to store
the compressed data
num_migrated RO the number of objects migrated migrated by compaction
pages_compacted RO the number of pages freed during compaction
(available only via zram<id>/mm_stat node)
compact WO trigger memory compaction
WARNING
+4 -3
View File
@@ -60,9 +60,10 @@ Filesystem support consists of
- implementing the direct_IO address space operation, and calling
dax_do_io() instead of blockdev_direct_IO() if S_DAX is set
- implementing an mmap file operation for DAX files which sets the
VM_MIXEDMAP flag on the VMA, and setting the vm_ops to include handlers
for fault and page_mkwrite (which should probably call dax_fault() and
dax_mkwrite(), passing the appropriate get_block() callback)
VM_MIXEDMAP and VM_HUGEPAGE flags on the VMA, and setting the vm_ops to
include handlers for fault, pmd_fault and page_mkwrite (which should
probably call dax_fault(), dax_pmd_fault() and dax_mkwrite(), passing the
appropriate get_block() callback)
- calling dax_truncate_page() instead of block_truncate_page() for DAX files
- calling dax_zero_page_range() instead of zero_user() for DAX files
- ensuring that there is sufficient locking between reads, writes,
+13 -5
View File
@@ -424,6 +424,7 @@ Private_Dirty: 0 kB
Referenced: 892 kB
Anonymous: 0 kB
Swap: 0 kB
SwapPss: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 374 kB
@@ -433,16 +434,23 @@ the first of these lines shows the same information as is displayed for the
mapping in /proc/PID/maps. The remaining lines show the size of the mapping
(size), the amount of the mapping that is currently resident in RAM (RSS), the
process' proportional share of this mapping (PSS), the number of clean and
dirty private pages in the mapping. Note that even a page which is part of a
MAP_SHARED mapping, but has only a single pte mapped, i.e. is currently used
by only one process, is accounted as private and not as shared. "Referenced"
indicates the amount of memory currently marked as referenced or accessed.
dirty private pages in the mapping.
The "proportional set size" (PSS) of a process is the count of pages it has
in memory, where each page is divided by the number of processes sharing it.
So if a process has 1000 pages all to itself, and 1000 shared with one other
process, its PSS will be 1500.
Note that even a page which is part of a MAP_SHARED mapping, but has only
a single pte mapped, i.e. is currently used by only one process, is accounted
as private and not as shared.
"Referenced" indicates the amount of memory currently marked as referenced or
accessed.
"Anonymous" shows the amount of memory that does not belong to any file. Even
a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
and a page is modified, the file page is replaced by a private anonymous copy.
"Swap" shows how much would-be-anonymous memory is also used, but out on
swap.
"SwapPss" shows proportional swap share of this mapping.
"VmFlags" field deserves a separate description. This member represents the kernel
flags associated with the particular virtual memory area in two letter encoded
manner. The codes are the following:
+2 -2
View File
@@ -349,7 +349,7 @@ zone[i]'s protection[j] is calculated by following expression.
(i < j):
zone[i]->protection[j]
= (total sums of present_pages from zone[i+1] to zone[j] on the node)
= (total sums of managed_pages from zone[i+1] to zone[j] on the node)
/ lowmem_reserve_ratio[i];
(i = j):
(should not be protected. = 0;
@@ -360,7 +360,7 @@ The default values of lowmem_reserve_ratio[i] are
256 (if zone[i] means DMA or DMA32 zone)
32 (others).
As above expression, they are reciprocal number of ratio.
256 means 1/256. # of protection pages becomes about "0.39%" of total present
256 means 1/256. # of protection pages becomes about "0.39%" of total managed
pages of higher zones on the node.
If you would like to protect more pages, smaller values are effective.
+2 -1
View File
@@ -75,7 +75,8 @@ On all - write a character to /proc/sysrq-trigger. e.g.:
'e' - Send a SIGTERM to all processes, except for init.
'f' - Will call oom_kill to kill a memory hog process.
'f' - Will call the oom killer to kill a memory hog process, but do not
panic if nothing can be killed.
'g' - Used by kgdb (kernel debugger)
+11 -4
View File
@@ -329,7 +329,14 @@ Examples
3) hugepage-mmap: see tools/testing/selftests/vm/hugepage-mmap.c
4) The libhugetlbfs (http://libhugetlbfs.sourceforge.net) library provides a
wide range of userspace tools to help with huge page usability, environment
setup, and control. Furthermore it provides useful test cases that should be
used when modifying code to ensure no regressions are introduced.
4) The libhugetlbfs (https://github.com/libhugetlbfs/libhugetlbfs) library
provides a wide range of userspace tools to help with huge page usability,
environment setup, and control.
Kernel development regression testing
=====================================
The most complete set of hugetlb tests are in the libhugetlbfs repository.
If you modify any hugetlb related code, use the libhugetlbfs test suite
to check for regressions. In addition, if you add any new hugetlb
functionality, please add appropriate tests to libhugetlbfs.
+13 -2
View File
@@ -16,11 +16,17 @@ There are three components to pagemap:
* Bits 0-4 swap type if swapped
* Bits 5-54 swap offset if swapped
* Bit 55 pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
* Bits 56-60 zero
* Bit 61 page is file-page or shared-anon
* Bit 56 page exclusively mapped (since 4.2)
* Bits 57-60 zero
* Bit 61 page is file-page or shared-anon (since 3.5)
* Bit 62 page swapped
* Bit 63 page present
Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs.
In 4.0 and 4.1 opens by unprivileged fail with -EPERM. Starting from
4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN.
Reason: information about PFNs helps in exploiting Rowhammer vulnerability.
If the page is not present but in swap, then the PFN contains an
encoding of the swap file number and the page's offset into the
swap. Unmapped pages return a null PFN. This allows determining
@@ -159,3 +165,8 @@ Other notes:
Reading from any of the files will return -EINVAL if you are not starting
the read on an 8-byte boundary (e.g., if you sought an odd number of bytes
into the file), or if the size of the read is not a multiple of 8 bytes.
Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
always 12 at most architectures). Since Linux 3.11 their meaning changes
after first clear of soft-dirty bits. Since Linux 4.2 they are used for
flags unconditionally.
+62
View File
@@ -339,6 +339,67 @@ static void __init request_standard_resources(void)
}
}
#ifdef CONFIG_BLK_DEV_INITRD
/*
* Relocate initrd if it is not completely within the linear mapping.
* This would be the case if mem= cuts out all or part of it.
*/
static void __init relocate_initrd(void)
{
phys_addr_t orig_start = __virt_to_phys(initrd_start);
phys_addr_t orig_end = __virt_to_phys(initrd_end);
phys_addr_t ram_end = memblock_end_of_DRAM();
phys_addr_t new_start;
unsigned long size, to_free = 0;
void *dest;
if (orig_end <= ram_end)
return;
/*
* Any of the original initrd which overlaps the linear map should
* be freed after relocating.
*/
if (orig_start < ram_end)
to_free = ram_end - orig_start;
size = orig_end - orig_start;
/* initrd needs to be relocated completely inside linear mapping */
new_start = memblock_find_in_range(0, PFN_PHYS(max_pfn),
size, PAGE_SIZE);
if (!new_start)
panic("Cannot relocate initrd of size %ld\n", size);
memblock_reserve(new_start, size);
initrd_start = __phys_to_virt(new_start);
initrd_end = initrd_start + size;
pr_info("Moving initrd from [%llx-%llx] to [%llx-%llx]\n",
orig_start, orig_start + size - 1,
new_start, new_start + size - 1);
dest = (void *)initrd_start;
if (to_free) {
memcpy(dest, (void *)__phys_to_virt(orig_start), to_free);
dest += to_free;
}
copy_from_early_mem(dest, orig_start + to_free, size - to_free);
if (to_free) {
pr_info("Freeing original RAMDISK from [%llx-%llx]\n",
orig_start, orig_start + to_free - 1);
memblock_free(orig_start, to_free);
}
}
#else
static inline void __init relocate_initrd(void)
{
}
#endif
u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
void __init setup_arch(char **cmdline_p)
@@ -372,6 +433,7 @@ void __init setup_arch(char **cmdline_p)
acpi_boot_table_init();
paging_init();
relocate_initrd();
request_standard_resources();
early_ioremap_reset();
+1 -5
View File
@@ -1140,13 +1140,9 @@ sba_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
#ifdef CONFIG_NUMA
{
int node = ioc->node;
struct page *page;
if (node == NUMA_NO_NODE)
node = numa_node_id();
page = alloc_pages_exact_node(node, flags, get_order(size));
page = alloc_pages_node(ioc->node, flags, get_order(size));
if (unlikely(!page))
return NULL;
+1 -1
View File
@@ -97,7 +97,7 @@ static int uncached_add_chunk(struct uncached_pool *uc_pool, int nid)
/* attempt to allocate a granule's worth of cached memory pages */
page = alloc_pages_exact_node(nid,
page = __alloc_pages_node(nid,
GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
IA64_GRANULE_SHIFT-PAGE_SHIFT);
if (!page) {
+1 -1
View File
@@ -92,7 +92,7 @@ static void *sn_dma_alloc_coherent(struct device *dev, size_t size,
*/
node = pcibus_to_node(pdev->bus);
if (likely(node >=0)) {
struct page *p = alloc_pages_exact_node(node,
struct page *p = __alloc_pages_node(node,
flags, get_order(size));
if (likely(p))
+1 -1
View File
@@ -123,7 +123,7 @@ static int __init cbe_ptcal_enable_on_node(int nid, int order)
area->nid = nid;
area->order = order;
area->pages = alloc_pages_exact_node(area->nid,
area->pages = __alloc_pages_node(area->nid,
GFP_KERNEL|__GFP_THISNODE,
area->order);
+1 -1
View File
@@ -14,7 +14,7 @@
#include <asm-generic/4level-fixup.h>
#include <linux/spinlock.h>
#include <linux/swap.h>
#include <linux/mm_types.h>
#include <asm/types.h>
#include <asm/pgtsrmmu.h>
#include <asm/vaddrs.h>
+1 -21
View File
@@ -317,15 +317,12 @@ static u64 __init get_ramdisk_size(void)
return ramdisk_size;
}
#define MAX_MAP_CHUNK (NR_FIX_BTMAPS << PAGE_SHIFT)
static void __init relocate_initrd(void)
{
/* Assume only end is not page aligned */
u64 ramdisk_image = get_ramdisk_image();
u64 ramdisk_size = get_ramdisk_size();
u64 area_size = PAGE_ALIGN(ramdisk_size);
unsigned long slop, clen, mapaddr;
char *p, *q;
/* We need to move the initrd down into directly mapped mem */
relocated_ramdisk = memblock_find_in_range(0, PFN_PHYS(max_pfn_mapped),
@@ -343,25 +340,8 @@ static void __init relocate_initrd(void)
printk(KERN_INFO "Allocated new RAMDISK: [mem %#010llx-%#010llx]\n",
relocated_ramdisk, relocated_ramdisk + ramdisk_size - 1);
q = (char *)initrd_start;
copy_from_early_mem((void *)initrd_start, ramdisk_image, ramdisk_size);
/* Copy the initrd */
while (ramdisk_size) {
slop = ramdisk_image & ~PAGE_MASK;
clen = ramdisk_size;
if (clen > MAX_MAP_CHUNK-slop)
clen = MAX_MAP_CHUNK-slop;
mapaddr = ramdisk_image & PAGE_MASK;
p = early_memremap(mapaddr, clen+slop);
memcpy(q, p+slop, clen);
early_memunmap(p, clen+slop);
q += clen;
ramdisk_image += clen;
ramdisk_size -= clen;
}
ramdisk_image = get_ramdisk_image();
ramdisk_size = get_ramdisk_size();
printk(KERN_INFO "Move RAMDISK from [mem %#010llx-%#010llx] to"
" [mem %#010llx-%#010llx]\n",
ramdisk_image, ramdisk_image + ramdisk_size - 1,
+1 -1
View File
@@ -3150,7 +3150,7 @@ static struct vmcs *alloc_vmcs_cpu(int cpu)
struct page *pages;
struct vmcs *vmcs;
pages = alloc_pages_exact_node(node, GFP_KERNEL, vmcs_config.order);
pages = __alloc_pages_node(node, GFP_KERNEL, vmcs_config.order);
if (!pages)
return NULL;
vmcs = page_address(pages);
+4 -2
View File
@@ -246,8 +246,10 @@ int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
bi->start = max(bi->start, low);
bi->end = min(bi->end, high);
/* and there's no empty block */
if (bi->start >= bi->end)
/* and there's no empty or non-exist block */
if (bi->start >= bi->end ||
!memblock_overlaps_region(&memblock.memory,
bi->start, bi->end - bi->start))
numa_remove_memblk_from(i--, mi);
}
+17 -13
View File
@@ -388,7 +388,6 @@ static ssize_t comp_algorithm_store(struct device *dev,
static ssize_t compact_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t len)
{
unsigned long nr_migrated;
struct zram *zram = dev_to_zram(dev);
struct zram_meta *meta;
@@ -399,8 +398,7 @@ static ssize_t compact_store(struct device *dev,
}
meta = zram->meta;
nr_migrated = zs_compact(meta->mem_pool);
atomic64_add(nr_migrated, &zram->stats.num_migrated);
zs_compact(meta->mem_pool);
up_read(&zram->init_lock);
return len;
@@ -428,26 +426,31 @@ static ssize_t mm_stat_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct zram *zram = dev_to_zram(dev);
struct zs_pool_stats pool_stats;
u64 orig_size, mem_used = 0;
long max_used;
ssize_t ret;
memset(&pool_stats, 0x00, sizeof(struct zs_pool_stats));
down_read(&zram->init_lock);
if (init_done(zram))
if (init_done(zram)) {
mem_used = zs_get_total_pages(zram->meta->mem_pool);
zs_pool_stats(zram->meta->mem_pool, &pool_stats);
}
orig_size = atomic64_read(&zram->stats.pages_stored);
max_used = atomic_long_read(&zram->stats.max_used_pages);
ret = scnprintf(buf, PAGE_SIZE,
"%8llu %8llu %8llu %8lu %8ld %8llu %8llu\n",
"%8llu %8llu %8llu %8lu %8ld %8llu %8lu\n",
orig_size << PAGE_SHIFT,
(u64)atomic64_read(&zram->stats.compr_data_size),
mem_used << PAGE_SHIFT,
zram->limit_pages << PAGE_SHIFT,
max_used << PAGE_SHIFT,
(u64)atomic64_read(&zram->stats.zero_pages),
(u64)atomic64_read(&zram->stats.num_migrated));
pool_stats.pages_compacted);
up_read(&zram->init_lock);
return ret;
@@ -619,7 +622,7 @@ static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
uncmem = user_mem;
if (!uncmem) {
pr_info("Unable to allocate temp memory\n");
pr_err("Unable to allocate temp memory\n");
ret = -ENOMEM;
goto out_cleanup;
}
@@ -716,7 +719,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
handle = zs_malloc(meta->mem_pool, clen);
if (!handle) {
pr_info("Error allocating memory for compressed page: %u, size=%zu\n",
pr_err("Error allocating memory for compressed page: %u, size=%zu\n",
index, clen);
ret = -ENOMEM;
goto out;
@@ -1036,7 +1039,7 @@ static ssize_t disksize_store(struct device *dev,
comp = zcomp_create(zram->compressor, zram->max_comp_streams);
if (IS_ERR(comp)) {
pr_info("Cannot initialise %s compressing backend\n",
pr_err("Cannot initialise %s compressing backend\n",
zram->compressor);
err = PTR_ERR(comp);
goto out_free_meta;
@@ -1214,7 +1217,7 @@ static int zram_add(void)
/* gendisk structure */
zram->disk = alloc_disk(1);
if (!zram->disk) {
pr_warn("Error allocating disk structure for device %d\n",
pr_err("Error allocating disk structure for device %d\n",
device_id);
ret = -ENOMEM;
goto out_free_queue;
@@ -1263,7 +1266,8 @@ static int zram_add(void)
ret = sysfs_create_group(&disk_to_dev(zram->disk)->kobj,
&zram_disk_attr_group);
if (ret < 0) {
pr_warn("Error creating sysfs group");
pr_err("Error creating sysfs group for device %d\n",
device_id);
goto out_free_disk;
}
strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor));
@@ -1403,13 +1407,13 @@ static int __init zram_init(void)
ret = class_register(&zram_control_class);
if (ret) {
pr_warn("Unable to register zram-control class\n");
pr_err("Unable to register zram-control class\n");
return ret;
}
zram_major = register_blkdev(0, "zram");
if (zram_major <= 0) {
pr_warn("Unable to get major number\n");
pr_err("Unable to get major number\n");
class_unregister(&zram_control_class);
return -EBUSY;
}
-1
View File
@@ -78,7 +78,6 @@ struct zram_stats {
atomic64_t compr_data_size; /* compressed size of pages stored */
atomic64_t num_reads; /* failed + successful */
atomic64_t num_writes; /* --do-- */
atomic64_t num_migrated; /* no. of migrated object */
atomic64_t failed_reads; /* can happen when memory is too low */
atomic64_t failed_writes; /* can happen when memory is too low */
atomic64_t invalid_io; /* non-page-aligned I/O requests */
+1 -1
View File
@@ -239,7 +239,7 @@ xpc_create_gru_mq_uv(unsigned int mq_size, int cpu, char *irq_name,
mq->mmr_blade = uv_cpu_to_blade_id(cpu);
nid = cpu_to_node(cpu);
page = alloc_pages_exact_node(nid,
page = __alloc_pages_node(nid,
GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
pg_order);
if (page == NULL) {

Some files were not shown because too many files have changed in this diff Show More