There is a subtle bug when calculating a number of acquired objects.
Currently, we calculate "available = page->objects - page->inuse",
after acquire_slab() is called in get_partial_node().
In acquire_slab() with mode = 1, we always set new.inuse = page->objects.
So,
acquire_slab(s, n, page, object == NULL);
if (!object) {
c->page = page;
stat(s, ALLOC_FROM_PARTIAL);
object = t;
available = page->objects - page->inuse;
!!! availabe is always 0 !!!
...
Therfore, "available > s->cpu_partial / 2" is always false and
we always go to second iteration.
This patch correct this problem.
After that, we don't need return value of put_cpu_partial().
So remove it.
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
After we create a boot cache, we may allocate from it until it is bootstraped.
This will move the page from the partial list to the cpu slab list. If this
happens, the loop:
list_for_each_entry(p, &n->partial, lru)
that we use to scan for all partial pages will yield nothing, and the pages
will keep pointing to the boot cpu cache, which is of course, invalid. To do
that, we should flush the cache to make sure that the cpu slab is back to the
partial list.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Reported-by: Steffen Michalke <StMichalke@web.de>
Tested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
commit "slab: Common Kmalloc cache determination" made mistake
in kmalloc_slab(). SLAB_CACHE_DMA is for kmem_cache creation,
not for allocation. For allocation, we should use GFP_XXX to identify
type of allocation. So, change SLAB_CACHE_DMA to GFP_DMA.
Acked-by: Christoph Lameter <cl@linux.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Put the definitions for the kmem_cache_node structures together so that
we have one structure. That will allow us to create more common fields in
the future which could yield more opportunities to share code.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
The list3 or l3 pointers are pointing to per node structures. Reflect
that in the names of variables used.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Extract the optimized lookup functions from slub and put them into
slab_common.c. Then make slab use these functions as well.
Joonsoo notes that this fixes some issues with constant folding which
also reduces the code size for slub.
https://lkml.org/lkml/2012/10/20/82
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
The kmalloc array is created in similar ways in both SLAB
and SLUB. Create a common function and have both allocators
call that function.
V1->V2:
Whitespace cleanup
Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Have a common definition fo the kmalloc cache arrays in
SLAB and SLUB
Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Standardize the constants that describe the smallest and largest
object kept in the kmalloc arrays for SLAB and SLUB.
Differentiate between the maximum size for which a slab cache is used
(KMALLOC_MAX_CACHE_SIZE) and the maximum allocatable size
(KMALLOC_MAX_SIZE, KMALLOC_MAX_ORDER).
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Have a common naming between both slab caches for future changes.
Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Rename the structure used for the per node structures in slab
to have a name that expresses that fact.
Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Make slab use the common functions. We can get rid of a lot
of old ugly stuff as a results. Among them the sizes
array and the weird include/linux/kmalloc_sizes file and
some pretty bad #include statements in slab_def.h.
The one thing that is different in slab is that the 32 byte
cache will also be created for arches that have page sizes
larger than 4K. There are numerous smaller allocations that
SLOB and SLUB can handle better because of their support for
smaller allocation sizes so lets keep the 32 byte slab also
for arches with > 4K pages.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Merge the rest of Andrew's patches for -rc1:
"A bunch of fixes and misc missed-out-on things.
That'll do for -rc1. I still have a batch of IPC patches which still
have a possible bug report which I'm chasing down."
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (25 commits)
keys: use keyring_alloc() to create module signing keyring
keys: fix unreachable code
sendfile: allows bypassing of notifier events
SGI-XP: handle non-fatal traps
fat: fix incorrect function comment
Documentation: ABI: remove testing/sysfs-devices-node
proc: fix inconsistent lock state
linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors
memcg: don't register hotcpu notifier from ->css_alloc()
checkpatch: warn on uapi #includes that #include <uapi/...
revert "rtc: recycle id when unloading a rtc driver"
mm: clean up transparent hugepage sysfs error messages
hfsplus: add error message for the case of failure of sync fs in delayed_sync_fs() method
hfsplus: rework processing of hfs_btree_write() returned error
hfsplus: rework processing errors in hfsplus_free_extents()
hfsplus: avoid crash on failed block map free
kcmp: include linux/ptrace.h
drivers/rtc/rtc-imxdi.c: must include <linux/spinlock.h>
mm: cma: WARN if freed memory is still in use
exec: do not leave bprm->interp on stack
...
Memory returned to free_contig_range() must have no other references.
Let kernel to complain loudly if page reference count is not equal to 1.
[rientjes@google.com: support sparsemem]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The system uses global_dirtyable_memory() to calculate number of
dirtyable pages/pages that can be allocated to the page cache. A bug
causes an underflow thus making the page count look like a big unsigned
number. This in turn confuses the dirty writeback throttling to
aggressively write back pages as they become dirty (usually 1 page at a
time). This generally only affects systems with highmem because the
underflowed count gets subtracted from the global count of dirtyable
memory.
The problem was introduced with v3.2-4896-gab8fabd
Fix is to ensure we don't get an underflowed total of either highmem or
global dirtyable memory.
Signed-off-by: Sonny Rao <sonnyrao@chromium.org>
Signed-off-by: Puneet Kumar <puneetster@chromium.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Damien Wyart <damien.wyart@free.fr>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
isolate_freepages_block() and isolate_migratepages_range() are used for
CMA as well as compaction so it breaks build for CONFIG_CMA &&
!CONFIG_COMPACTION.
This patch fixes it.
[akpm@linux-foundation.org: add "do { } while (0)", per Mel]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull virtio update from Rusty Russell:
"Some nice cleanups, and even a patch my wife did as a "live" demo for
Latinoware 2012.
There's a slightly non-trivial merge in virtio-net, as we cleaned up
the virtio add_buf interface while DaveM accepted the mq virtio-net
patches."
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (27 commits)
virtio_console: Add support for remoteproc serial
virtio_console: Merge struct buffer_token into struct port_buffer
virtio: add drv_to_virtio to make code clearly
virtio: use dev_to_virtio wrapper in virtio
virtio-mmio: Fix irq parsing in command line parameter
virtio_console: Free buffers from out-queue upon close
virtio: Convert dev_printk(KERN_<LEVEL> to dev_<level>(
virtio_console: Use kmalloc instead of kzalloc
virtio_console: Free buffer if splice fails
virtio: tools: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: scsi: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: rpmsg: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: net: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: console: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: make virtqueue_add_buf() returning 0 on success, not capacity.
virtio: console: don't rely on virtqueue_add_buf() returning capacity.
virtio_net: don't rely on virtqueue_add_buf() returning capacity.
virtio-net: remove unused skb_vnet_hdr->num_sg field
virtio-net: correct capacity math on ring full
virtio: move queue_index and num_free fields into core struct virtqueue.
...
The rmap walks in ksm.c are like those in rmap.c: they can safely be
done with anon_vma_lock_read().
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On a 4GB RAM machine, where Normal zone is much smaller than DMA32 zone,
the Normal zone gets fragmented in time. This requires relatively more
pressure in balance_pgdat to get the zone above the required watermark.
Unfortunately, the congestion_wait() call in there slows it down for a
completely wrong reason, expecting that there's a lot of
writeback/swapout, even when there's none (much more common). After a
few days, when fragmentation progresses, this flawed logic translates to
a very high CPU iowait times, even though there's no I/O congestion at
all. If THP is enabled, the problem occurs sooner, but I was able to
see it even on !THP kernels, just by giving it a bit more time to occur.
The proper way to deal with this is to not wait, unless there's
congestion. Thanks to Mel Gorman, we already have the function that
perfectly fits the job. The patch was tested on a machine which nicely
revealed the problem after only 1 day of uptime, and it's been working
great.
Signed-off-by: Zlatko Calusic <zlatko.calusic@iskon.hr>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>