linux

mirror of https://github.com/armbian/linux.git synced 2026-01-06 10:13:00 -08:00

Author	SHA1	Message	Date
Christoph Lameter	1b4244647c	Use ZVC counters to establish exact size of dirtyable pages We can use the global ZVC counters to establish the exact size of the LRU and the free pages. This allows a more accurate determination of the dirty ratio. This patch will fix the broken ratio calculations if large amounts of memory are allocated to huge pags or other consumers that do not put the pages on to the LRU. Notes: - I did not add NR_SLAB_RECLAIMABLE to the calculation of the dirtyable pages. Those may be reclaimable but they are at this point not dirtyable. If NR_SLAB_RECLAIMABLE would be considered then a huge number of reclaimable pages would stop writeback from occurring. - This patch used to be in mm as the last one in a series of patches. It was removed when Linus updated the treatment of highmem because there was a conflict. I updated the patch to follow Linus' approach. This patch is neede to fulfill the claims made in the beginning of the patchset that is now in Linus' tree. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:51 -07:00
Christoph Lameter	476f35348e	Safer nr_node_ids and nr_node_ids determination and initial values The nr_cpu_ids value is currently only calculated in smp_init. However, it may be needed before (SLUB needs it on kmem_cache_init!) and other kernel components may also want to allocate dynamically sized per cpu array before smp_init. So move the determination of possible cpus into sched_init() where we already loop over all possible cpus early in boot. Also initialize both nr_node_ids and nr_cpu_ids with the highest value they could take. If we have accidental users before these values are determined then the current valud of 0 may cause too small per cpu and per node arrays to be allocated. If it is set to the maximum possible then we only waste some memory for early boot users. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:51 -07:00
Jeremy Fitzhardinge	aee16b3cee	Add apply_to_page_range() which applies a function to a pte range Add a new mm function apply_to_page_range() which applies a given function to every pte in a given virtual address range in a given mm structure. This is a generic alternative to cut-and-pasting the Linux idiomatic pagetable walking code in every place that a sequence of PTEs must be accessed. Although this interface is intended to be useful in a wide range of situations, it is currently used specifically by several Xen subsystems, for example: to ensure that pagetables have been allocated for a virtual address range, and to construct batched special pagetable update requests to map I/O memory (in ioremap()). [akpm@linux-foundation.org: fix warning, unpleasantly] Signed-off-by: Ian Pratt <ian.pratt@xensource.com> Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Matt Mackall <mpm@waste.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:51 -07:00
Pekka Enberg	fd76bab2fa	slab: introduce krealloc This introduce krealloc() that reallocates memory while keeping the contents unchanged. The allocator avoids reallocation if the new size fits the currently used cache. I also added a simple non-optimized version for mm/slob.c for compatibility. [akpm@linux-foundation.org: fix warnings] Acked-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu> Acked-by: Matt Mackall <mpm@selenic.com> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:50 -07:00
David S. Miller	6a5b518f22	[MM]: sparse_init() should be __init. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-06 23:54:25 -07:00
Siddha, Suresh B	62918a0361	[PATCH] x86-64: skip cache_free_alien() on non NUMA Set use_alien_caches to 0 on non NUMA platforms. And avoid calling the cache_free_alien() when use_alien_caches is not set. This will avoid the cache miss that happens while dereferencing slabp to get nodeid. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:18 +02:00
Jeremy Fitzhardinge	ce6234b529	[PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pages Xen and VMI both have special requirements when mapping a highmem pte page into the kernel address space. These can be dealt with by adding a new kmap_atomic_pte() function for mapping highptes, and hooking it into the paravirt_ops infrastructure. Xen specifically wants to map the pte page RO, so this patch exposes a helper function, kmap_atomic_prot, which maps the page with the specified page protections. This also adds a kmap_flush_unused() function to clear out the cached kmap mappings. Xen needs this to clear out any potential stray RW mappings of pages which will become part of a pagetable. [ Zach - vmi.c will need some attention after this patch. It wasn't immediately obvious to me what needs to be done. ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com>	2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge	d6dd61c831	[PATCH] x86: PARAVIRT: add hooks to intercept mm creation and destruction Add hooks to allow a paravirt implementation to track the lifetime of an mm. Paravirtualization requires three hooks, but only two are needed in common code. They are: arch_dup_mmap, which is called when a new mmap is created at fork arch_exit_mmap, which is called when the last process reference to an mm is dropped, which typically happens on exit and exec. The third hook is activate_mm, which is called from the arch-specific activate_mm() macro/function, and so doesn't need stub versions for other architectures. It's called when an mm is first used. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: linux-arch@vger.kernel.org Cc: James Bottomley <James.Bottomley@SteelEye.com> Acked-by: Ingo Molnar <mingo@elte.hu>	2007-05-02 19:27:14 +02:00
Andi Kleen	0d08e0d3a9	[PATCH] x86-64: Fix vmalloc_32 to really allocate <4GB on 64bit platforms Ugly ifdef, but should handle all 64bit platforms that have suitable zones. On some like Altix it's probably impossible without IOMMU use to get memory <4GB this way, but they have to live with that. Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:12 +02:00
Linus Torvalds	da8ac5e0fa	Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6 * 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: (38 commits) [S390] SPIN_LOCK_UNLOCKED cleanup in drivers/s390 [S390] Clean up smp code in preparation for some larger changes. [S390] Remove debugging junk. [S390] Switch etr from tasklet to workqueue. [S390] split page_test_and_clear_dirty. [S390] Processor degradation notification. [S390] vtime: cleanup per_cpu usage. [S390] crypto: cleanup. [S390] sclp: fix coding style. [S390] vmlogrdr: stop IUCV connection in vmlogrdr_release. [S390] sclp: initialize early. [S390] ctc: kmalloc->kzalloc/casting cleanups. [S390] zfcpdump support. [S390] dasd: Add ipldev parameter. [S390] dasd: Add sysfs attribute status and generate uevents. [S390] Improved kernel stack overflow checking. [S390] Get rid of console setup functions. [S390] No execute support cleanup. [S390] Minor fault path optimization. [S390] Use generic bug. ...	2007-04-27 09:15:31 -07:00
Linus Torvalds	07db59bd6b	Change default dirty-writeback limits Do this really early in the 2.6.22-rc series, so that we'll get feedback. And don't change by half measures. Just cut the default dirty limit to a quarter of what it was, and see if anybody even notices. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-27 09:10:47 -07:00
Martin Schwidefsky	6c210482ae	[S390] split page_test_and_clear_dirty. The page_test_and_clear_dirty primitive really consists of two operations, page_test_dirty and the page_clear_dirty. The combination of the two is not an atomic operation, so it makes more sense to have two separate operations instead of one. In addition to the improved readability of the s390 version of SetPageUptodate, it now avoids the page_test_dirty operation which is an insert-storage-key-extended (iske) instruction which is an expensive operation. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-04-27 16:01:46 +02:00
Christoph Lameter	0e8c7d0fd5	page migration: fix NR_FILE_PAGES accounting NR_FILE_PAGES must be accounted for depending on the zone that the page belongs to. If we replace the page in the radix tree then we may have to shift the count to another zone. Suggested-by: Ethan Solomita <solo@google.com> Eventually-typed-in-by: Christoph Lameter <clameter@sgi.com> Cc: Martin Bligh <mbligh@mbligh.org> Cc: <stable@kernel.org> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-24 08:23:08 -07:00
Hugh Dickins	3d124cbba3	fix OOM killing processes wrongly thought MPOL_BIND I only have CONFIG_NUMA=y for build testing: surprised when trying a memhog to see lots of other processes killed with "No available memory (MPOL_BIND)". memhog is killed correctly once we initialize nodemask in constrained_alloc(). Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: William Irwin <bill.irwin@oracle.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-24 08:23:07 -07:00
David Rientjes	650a7c974f	oom: kill all threads that share mm with killed task oom_kill_task() calls __oom_kill_task() to OOM kill a selected task. When finding other threads that share an mm with that task, we need to kill those individual threads and not the same one. (Bug introduced by `f2a2a7108a`) Acked-by: William Irwin <bill.irwin@oracle.com> Acked-by: Christoph Lameter <clameter@engr.sgi.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Andrew Morton <akpm@osdl.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-24 08:11:49 -07:00
Wu, Bryan	6a04de6dbe	[PATCH] nommu: fix bug ip_conntrack does not work on nommu num_physpages is not exported out in mm/nommu.c, so the ip_conntrack module link will fail. Signed-off-by: Bryan Wu <bryan.wu@analog.com> Acked-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-12 15:31:42 -07:00
Linus Torvalds	8d00647f2c	Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6 * 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: [S390] cio: Fix handling of interrupt for csch(). [S390] page_mkclean data corruption.	2007-04-04 10:11:16 -07:00
David Howells	e94a40c508	[PATCH] SLAB: Mention slab name when listing corrupt objects Mention the slab name when listing corrupt objects. Although the function that released the memory is mentioned, that is frequently ambiguous as such functions often release several pieces of memory. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-04 08:51:52 -07:00
Martin Schwidefsky	6e1beb3c22	[S390] page_mkclean data corruption. The git commit `c2fda5fed8` which added the page_test_and_clear_dirty call to page_mkclean and the git commit `7658cc2892` which fixes the "nasty and subtle race in shared mmap'ed page writeback" problem in clear_page_dirty_for_io cause data corruption on s390. The effect of the two changes is that for every call to clear_page_dirty_for_io a page_test_and_clear_dirty is done. If the per page dirty bit is set set_page_dirty is called. Strangly clear_page_dirty_for_io is called for not-uptodate pages, e.g. over this call-chain: [<000000000007c0f2>] clear_page_dirty_for_io+0x12a/0x130 [<000000000007c494>] generic_writepages+0x258/0x3e0 [<000000000007c692>] do_writepages+0x76/0x7c [<00000000000c7a26>] __writeback_single_inode+0xba/0x3e4 [<00000000000c831a>] sync_sb_inodes+0x23e/0x398 [<00000000000c8802>] writeback_inodes+0x12e/0x140 [<000000000007b9ee>] wb_kupdate+0xd2/0x178 [<000000000007cca2>] pdflush+0x162/0x23c The bad news now is that page_test_and_clear_dirty might claim that a not-uptodate page is dirty since SetPageUptodate which resets the per page dirty bit has not yet been called. The page writeback that follows clobbers the data on disk. The simplest solution to this problem is to move the call to page_test_and_clear_dirty under the "if (page_mapped(page))". If a file backed page is mapped it is uptodate. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-04-04 14:37:39 +02:00
Carsten Otte	a76c0b9763	[PATCH] mm: fix xip issue with /dev/zero Fix the bug, that reading into xip mapping from /dev/zero fills the user page table with ZERO_PAGE() entries. Later on, xip cannot tell which pages have been ZERO_PAGE() filled by access to a sparse mapping, and which ones origin from /dev/zero. It will unmap ZERO_PAGE from all mappings when filling the sparse hole with data. xip does now use its own zeroed page for its sparse mappings. Please apply. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-29 08:22:26 -07:00
Hugh Dickins	90ed52ebe4	[PATCH] holepunch: fix mmap_sem i_mutex deadlock sys_madvise has down_write of mmap_sem, then madvise_remove calls vmtruncate_range which takes i_mutex and i_alloc_sem: no, we can easily devise deadlocks from that ordering. madvise_remove drop mmap_sem while calling vmtruncate_range: luckily, since madvise_remove doesn't split or merge vmas, it's easy to handle this case with a NULL prev, without restructuring sys_madvise. (Though sad to retake mmap_sem when it's unlikely to be needed, and certainly down_read is sufficient for MADV_REMOVE, unlike the other madvices.) Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-29 08:22:26 -07:00
Hugh Dickins	16a100190d	[PATCH] holepunch: fix disconnected pages after second truncate shmem_truncate_range has its own truncate_inode_pages_range, to free any pages racily instantiated while it was in progress: a SHMEM_PAGEIN flag is set when this might have happened. But holepunching gets no chance to clear that flag at the start of vmtruncate_range, so it's always set (unless a truncate came just before), so holepunch almost always does this second truncate_inode_pages_range. shmem holepunch has unlikely swap<->file races hereabouts whatever we do (without a fuller rework than is fit for this release): I was going to skip the second truncate in the punch_hole case, but Miklos points out that would make holepunch correctness more vulnerable to swapoff. So keep the second truncate, but follow it by an unmap_mapping_range to eliminate the disconnected pages (freed from pagecache while still mapped in userspace) that it might have left behind. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-29 08:22:25 -07:00
Hugh Dickins	1ae7000630	[PATCH] holepunch: fix shmem_truncate_range punch locking Miklos Szeredi observes that during truncation of shmem page directories, info->lock is released to improve latency (after lowering i_size and next_index to exclude races); but this is quite wrong for holepunching, which receives no such protection from i_size or next_index, and is left vulnerable to races with shmem_unuse, shmem_getpage and shmem_writepage. Hold info->lock throughout when holepunching? No, any user could prevent rescheduling for far too long. Instead take info->lock just when needed: in shmem_free_swp when removing the swap entries, and whenever removing a directory page from the level above. But so long as we remove before scanning, we can safely skip taking the lock at the lower levels, except at misaligned start and end of the hole. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-29 08:22:25 -07:00
Hugh Dickins	a2646d1e6c	[PATCH] holepunch: fix shmem_truncate_range punching too far Miklos Szeredi observes BUG_ON(!entry) in shmem_writepage() triggered in rare circumstances, because shmem_truncate_range() erroneously removes partially truncated directory pages at the end of the range: later reclaim on pages pointing to these removed directories triggers the BUG. Indeed, and it can also cause data loss beyond the hole. Fix this as in the patch proposed by Miklos, but distinguish between "limit" (how far we need to search: ignore truncation's next_index optimization in the holepunch case - if there are races it's more consistent to act on the whole range specified) and "upper_limit" (how far we can free directory pages: generally we must be careful to keep partially punched pages, but can relax at end of file - i_size being held stable by i_mutex). Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Miklos Szeredi <mszeredi@suse.cs> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-29 08:22:25 -07:00
Vasily Tarasov	f772b3d9ca	block: blk_max_pfn is somtimes wrong There is a small problem in handling page bounce. At the moment blk_max_pfn equals max_pfn, which is in fact not maximum possible _number_ of a page frame, but the _amount_ of page frames. For example for the 32bit x86 node with 4Gb RAM, max_pfn = 0x100000, but not 0xFFFF. request_queue structure has a member q->bounce_pfn and queue needs bounce pages for the pages _above_ this limit. This routine is handled by blk_queue_bounce(), where the following check is produced: if (q->bounce_pfn >= blk_max_pfn) return; Assume, that a driver has set q->bounce_pfn to 0xFFFF, but blk_max_pfn equals 0x10000. In such situation the check above fails and for each bio we always fall down for iterating over pages tied to the bio. I want to notice, that for quite a big range of device drivers (ide, md, ...) such problem doesn't happen because they use BLK_BOUNCE_ANY for bounce_pfn. BLK_BOUNCE_ANY is defined as blk_max_pfn << PAGE_SHIFT, and then the check above doesn't fail. But for other drivers, which obtain reuired value from drivers, it fails. For example sata_nv uses ATA_DMA_MASK or dev->dma_mask. I propose to use (max_pfn - 1) for blk_max_pfn. And the same for blk_max_low_pfn. The patch also cleanses some checks related with bounce_pfn. Signed-off-by: Vasily Tarasov <vtaras@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-03-27 08:52:47 +02:00

... 36 37 38 39 40 ...

2180 Commits