Commit Graph

253648 Commits

Author SHA1 Message Date
Jean Delvare b0461a44a2 MAINTAINERS: add entry for legacy eeprom driver
I shall maintain the legacy eeprom driver, until we finally get rid of it.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:01 -07:00
KAMEZAWA Hiroyuki fbc29a25e4 memcg: avoid percpu cached charge draining at softlimit
Based on Michal Hocko's comment.

We are not draining per cpu cached charges during soft limit reclaim
because background reclaim doesn't care about charges.  It tries to free
some memory and charges will not give any.

Cached charges might influence only selection of the biggest soft limit
offender but as the call is done only after the selection has been already
done it makes no change.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:01 -07:00
KAMEZAWA Hiroyuki 26fe616844 memcg: fix percpu cached charge draining frequency
For performance, memory cgroup caches some "charge" from res_counter into
per cpu cache.  This works well but because it's cache, it needs to be
flushed in some cases.  Typical cases are

   1. when someone hit limit.

   2. when rmdir() is called and need to charges to be 0.

But "1" has problem.

Recently, with large SMP machines, we see many kworker runs because of
flushing memcg's cache.  Bad things in implementation are that even if a
cpu contains a cache for memcg not related to a memcg which hits limit,
drain code is called.

This patch does
        A) check percpu cache contains a useful data or not.
        B) check other asynchronous percpu draining doesn't run.
        C) don't call local cpu callback.

(*)This patch avoid changing the calling condition with hard-limit.

When I run "cat 1Gfile > /dev/null" under 300M limit memcg,

[Before]
13767 kamezawa  20   0 98.6m  424  416 D 10.0  0.0   0:00.61 cat
   58 root      20   0     0    0    0 S  0.6  0.0   0:00.09 kworker/2:1
   60 root      20   0     0    0    0 S  0.6  0.0   0:00.08 kworker/4:1
    4 root      20   0     0    0    0 S  0.3  0.0   0:00.02 kworker/0:0
   57 root      20   0     0    0    0 S  0.3  0.0   0:00.05 kworker/1:1
   61 root      20   0     0    0    0 S  0.3  0.0   0:00.05 kworker/5:1
   62 root      20   0     0    0    0 S  0.3  0.0   0:00.05 kworker/6:1
   63 root      20   0     0    0    0 S  0.3  0.0   0:00.05 kworker/7:1

[After]
 2676 root      20   0 98.6m  416  416 D  9.3  0.0   0:00.87 cat
 2626 kamezawa  20   0 15192 1312  920 R  0.3  0.0   0:00.28 top
    1 root      20   0 19384 1496 1204 S  0.0  0.0   0:00.66 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
    4 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kworker/0:0

[akpm@linux-foundation.org: make percpu_charge_mutex static, tweak comments]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:01 -07:00
KAMEZAWA Hiroyuki 7ae534d074 memcg: fix wrong check of noswap with softlimit
Hierarchical reclaim doesn't swap out if memsw and resource limits are
thye same (memsw_is_minimum == true) because we would hit mem+swap limit
anyway (during hard limit reclaim).

If it comes to the soft limit we shouldn't consider memsw_is_minimum at
all because it doesn't make much sense.  Either the soft limit is bellow
the hard limit and then we cannot hit mem+swap limit or the direct reclaim
takes a precedence.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:01 -07:00
KAMEZAWA Hiroyuki 733eda7ac3 memcg: clear mm->owner when last possible owner leaves
The following crash was reported:

> Call Trace:
> [<ffffffff81139792>] mem_cgroup_from_task+0x15/0x17
> [<ffffffff8113a75a>] __mem_cgroup_try_charge+0x148/0x4b4
> [<ffffffff810493f3>] ? need_resched+0x23/0x2d
> [<ffffffff814cbf43>] ? preempt_schedule+0x46/0x4f
> [<ffffffff8113afe8>] mem_cgroup_charge_common+0x9a/0xce
> [<ffffffff8113b6d1>] mem_cgroup_newpage_charge+0x5d/0x5f
> [<ffffffff81134024>] khugepaged+0x5da/0xfaf
> [<ffffffff81078ea0>] ? __init_waitqueue_head+0x4b/0x4b
> [<ffffffff81133a4a>] ? add_mm_counter.constprop.5+0x13/0x13
> [<ffffffff81078625>] kthread+0xa8/0xb0
> [<ffffffff814d13e8>] ? sub_preempt_count+0xa1/0xb4
> [<ffffffff814d5664>] kernel_thread_helper+0x4/0x10
> [<ffffffff814ce858>] ? retint_restore_args+0x13/0x13
> [<ffffffff8107857d>] ? __init_kthread_worker+0x5a/0x5a

What happens is that khugepaged tries to charge a huge page against an mm
whose last possible owner has already exited, and the memory controller
crashes when the stale mm->owner is used to look up the cgroup to charge.

mm->owner has never been set to NULL with the last owner going away, but
nobody cared until khugepaged came along.

Even then it wasn't a problem because the final mmput() on an mm was
forced to acquire and release mmap_sem in write-mode, preventing an
exiting owner to go away while the mmap_sem was held, and until "692e0b3
mm: thp: optimize memcg charge in khugepaged", the memory cgroup charge
was protected by mmap_sem in read-mode.

Instead of going back to relying on the mmap_sem to enforce lifetime of a
task, this patch ensures that mm->owner is properly set to NULL when the
last possible owner is exiting, which the memory controller can handle
just fine.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Hugh Dickins <hughd@google.com>
Reported-by: Dave Jones <davej@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:01 -07:00
KAMEZAWA Hiroyuki 37573e8c71 memcg: fix init_page_cgroup nid with sparsemem
Commit 21a3c96468 ("memcg: allocate memory cgroup structures in local
nodes") makes page_cgroup allocation as NUMA aware.  But that caused a
problem https://bugzilla.kernel.org/show_bug.cgi?id=36192.

The problem was getting a NID from invalid struct pages, which was not
initialized because it was out-of-node, out of [node_start_pfn,
node_end_pfn)

Now, with sparsemem, page_cgroup_init scans pfn from 0 to max_pfn.  But
this may scan a pfn which is not on any node and can access memmap which
is not initialized.

This makes page_cgroup_init() for SPARSEMEM node aware and remove a code
to get nid from page->flags.  (Then, we'll use valid NID always.)

[akpm@linux-foundation.org: try to fix up comments]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:01 -07:00
KAMEZAWA Hiroyuki 8957712710 mm: memory.numa_stat: fix file permission
Commit 406eb0c9ba ("memcg: add memory.numastat api for numa
statistics") adds memory.numa_stat file for memory cgroup.  But the file
permissions are wrong.

  [kamezawa@bluextal linux-2.6]$ ls -l /cgroup/memory/A/memory.numa_stat
  ---------- 1 root root 0 Jun  9 18:36 /cgroup/memory/A/memory.numa_stat

This patch fixes the permission as

  [root@bluextal kamezawa]# ls -l /cgroup/memory/A/memory.numa_stat
  -r--r--r-- 1 root root 0 Jun 10 16:49 /cgroup/memory/A/memory.numa_stat

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:01 -07:00
Eric Miao 45d16f09dd leds: fix the incorrect display in menuconfig
Seems when a config option does not have a dependency of the menuconfig,
it messes the display of the rest configs, even if it's a hidden one.

Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:01 -07:00
Rafael Aquini b0320c7b7d mm: fix negative commitlimit when gigantic hugepages are allocated
When 1GB hugepages are allocated on a system, free(1) reports less
available memory than what really is installed in the box.  Also, if the
total size of hugepages allocated on a system is over half of the total
memory size, CommitLimit becomes a negative number.

The problem is that gigantic hugepages (order > MAX_ORDER) can only be
allocated at boot with bootmem, thus its frames are not accounted to
'totalram_pages'.  However, they are accounted to hugetlb_total_pages()

What happens to turn CommitLimit into a negative number is this
calculation, in fs/proc/meminfo.c:

        allowed = ((totalram_pages - hugetlb_total_pages())
                * sysctl_overcommit_ratio / 100) + total_swap_pages;

A similar calculation occurs in __vm_enough_memory() in mm/mmap.c.

Also, every vm statistic which depends on 'totalram_pages' will render
confusing values, as if system were 'missing' some part of its memory.

Impact of this bug:

When gigantic hugepages are allocated and sysctl_overcommit_memory ==
OVERCOMMIT_NEVER.  In a such situation, __vm_enough_memory() goes through
the mentioned 'allowed' calculation and might end up mistakenly returning
-ENOMEM, thus forcing the system to start reclaiming pages earlier than it
would be ususal, and this could cause detrimental impact to overall
system's performance, depending on the workload.

Besides the aforementioned scenario, I can only think of this causing
annoyances with memory reports from /proc/meminfo and free(1).

[akpm@linux-foundation.org: standardize comment layout]
Reported-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Rafael Aquini <aquini@linux.com>
Acked-by: Russ Anderson <rja@sgi.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:01 -07:00
KAMEZAWA Hiroyuki 959ecc48fc mm/memory_hotplug.c: fix building of node hotplug zonelist
During memory hotplug we refresh zonelists when we online a page in a new
zone.  It means that the node's zonelist is not initialized until pages
are onlined.  So for example, "nid" passed by MEM_GOING_ONLINE notifier
will point to NODE_DATA(nid) which has no zone fallback list.  Moreover,
if we hot-add cpu-only nodes, alloc_pages() will do no fallback.

This patch makes a zonelist when a new pgdata is available.

Note: in production, at fujitsu, memory should be onlined before cpu
      and our server didn't have any memory-less nodes and had no problems.

      But recent changes in MEM_GOING_ONLINE+page_cgroup
      will access not initialized zonelist of node.
      Anyway, there are memory-less node and we need some care.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:01 -07:00
Borislav Petkov de695e159e init/calibrate.c: remove annoying printk
Remove calibrate_delay_direct()'s KERN_DEBUG printk related to bogomips
calculation as it appears when booting every core on setups with
'ignore_loglevel' which dmesg people scan for possible issues.  As the
message doesn't show very useful information to the widest audience of
kernel boot message gazers, it should be removed.

Introduced by commit d2b463135f ("init/calibrate.c: fix for critical
bogoMIPS intermittent calculation failure").

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andrew Worsley <amworsley@gmail.com>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:01 -07:00
Geert Uytterhoeven 26575f9544 w1: W1_MASTER_DS1WM should depend on GENERIC_HARDIRQS
On m68k (which doesn't support generic hardirqs yet):

  drivers/w1/masters/ds1wm.c: In function `ds1wm_probe':
  drivers/w1/masters/ds1wm.c: error: implicit declaration of function `irq_set_irq_type'

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Jean-Franois Dagenais <dagenaisj@sonatest.com>
Cc: Matt Reimer <mreimer@vpop.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:00 -07:00
Nicolas Kaiser 49b24d6b41 include/asm-generic/pgtable.h: fix unbalanced parenthesis
Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:00 -07:00
Pawel Osciak 9e6f343852 MAINTAINERS: add videobuf2 maintainers
Add maintainers for the videobuf2 V4L2 driver framework.

Signed-off-by: Pawel Osciak <pawel@osciak.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:00 -07:00
Uwe Kleine-König be5ce2f1c9 leds: move LEDS_GPIO_REGISTER out of menuconfig NEW_LEDS
Commit 4440673a95 ("leds: provide helper to register "leds-gpio"
devices") broke the display of the NEW_LEDS menu as it didn't depend on
NEW_LEDS and so made "LED drivers" and "LED Triggers" appear at the same
level as "LED Support" instead of below it as it was before 4440673a.

Moving LEDS_GPIO_REGISTER out of the menuconfig NEW_LEDS fixes this
unintended side effect.

Reported-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:00 -07:00
Axel Lin 9d8f776bfb drivers/leds/leds-asic3: make LEDS_ASIC3 depend on LEDS_CLASS
We call led_classdev_unregister/led_classdev_register in
asic3_led_remove/asic3_led_probe, thus make LEDS_ASIC3 depend on
LEDS_CLASS.

This patch fixes below build error if LEDS_CLASS is not configured.

    LD      .tmp_vmlinux1
  drivers/built-in.o: In function `asic3_led_remove':
  clkdev.c:(.devexit.text+0x1860): undefined reference to `led_classdev_unregister'
  drivers/built-in.o: In function `asic3_led_probe':
  clkdev.c:(.devinit.text+0xcee8): undefined reference to `led_classdev_register'
  make: *** [.tmp_vmlinux1] Error 1

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Paul Parsons <lost.distance@yahoo.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:00 -07:00
Balbir Singh 185e595f77 MAINTAINERS: Balbir has moved
Update my email address.  Email will start to the old address bouncing
soon

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:00 -07:00
Josh Triplett bd5dc17be8 uts: make default hostname configurable, rather than always using "(none)"
The "hostname" tool falls back to setting the hostname to "localhost" if
/etc/hostname does not exist.  Distribution init scripts have the same
fallback.  However, if userspace never calls sethostname, such as when
booting with init=/bin/sh, or otherwise booting a minimal system without
the usual init scripts, the default hostname of "(none)" remains,
unhelpfully appearing in various places such as prompts ("root@(none):~#")
and logs.  Furthermore, "(none)" doesn't typically resolve to anything
useful.

Make the default hostname configurable.  This removes the need for the
standard fallback, provides a useful default for systems that never call
sethostname, and makes minimal systems that much more useful with less
configuration.  Distributions could choose to use "localhost" here to
avoid the fallback, while embedded systems may wish to use a specific
target hostname.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: David Miller <davem@davemloft.net>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Kel Modderman <kel@otaku42.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:00 -07:00
Dr. David Alan Gilbert ca39599c63 BUILD_BUG_ON_ZERO: fix sparse breakage
BUILD_BUG_ON_ZERO and BUILD_BUG_ON_NULL must return values, even in the
CHECKER case otherwise various users of it become syntactically invalid.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:00 -07:00
Michal Hocko 3957c7768e mm: compaction: fix special case -1 order checks
Commit 56de7263fc ("mm: compaction: direct compact when a high-order
allocation fails") introduced a check for cc->order == -1 in
compact_finished.  We should continue compacting in that case because
the request came from userspace and there is no particular order to
compact for.  Similar check has been added by 82478fb7 (mm: compaction:
prevent division-by-zero during user-requested compaction) for
compaction_suitable.

The check is, however, done after zone_watermark_ok which uses order as a
right hand argument for shifts.  Not only watermark check is pointless if
we can break out without it but it also uses 1 << -1 which is not well
defined (at least from C standard).  Let's move the -1 check above
zone_watermark_ok.

[minchan.kim@gmail.com> - caught compaction_suitable]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hioryu@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:00 -07:00
Steven Rostedt 5f1a19070b mm: fix wrong kunmap_atomic() pointer
Running a ktest.pl test, I hit the following bug on x86_32:

  ------------[ cut here ]------------
  WARNING: at arch/x86/mm/highmem_32.c:81 __kunmap_atomic+0x64/0xc1()
   Hardware name:
  Modules linked in:
  Pid: 93, comm: sh Not tainted 2.6.39-test+ #1
  Call Trace:
   [<c04450da>] warn_slowpath_common+0x7c/0x91
   [<c042f5df>] ? __kunmap_atomic+0x64/0xc1
   [<c042f5df>] ? __kunmap_atomic+0x64/0xc1^M
   [<c0445111>] warn_slowpath_null+0x22/0x24
   [<c042f5df>] __kunmap_atomic+0x64/0xc1
   [<c04d4a22>] unmap_vmas+0x43a/0x4e0
   [<c04d9065>] exit_mmap+0x91/0xd2
   [<c0443057>] mmput+0x43/0xad
   [<c0448358>] exit_mm+0x111/0x119
   [<c044855f>] do_exit+0x1ff/0x5fa
   [<c0454ea2>] ? set_current_blocked+0x3c/0x40
   [<c0454f24>] ? sigprocmask+0x7e/0x8e
   [<c0448b55>] do_group_exit+0x65/0x88
   [<c0448b90>] sys_exit_group+0x18/0x1c
   [<c0c3915f>] sysenter_do_call+0x12/0x38
  ---[ end trace 8055f74ea3c0eb62 ]---

Running a ktest.pl git bisect, found the culprit: commit e303297e6c
("mm: extended batches for generic mmu_gather")

But although this was the commit triggering the bug, it was not the one
originally responsible for the bug.  That was commit d16dfc550f ("mm:
mmu_gather rework").

The code in zap_pte_range() has something that looks like the following:

	pte =  pte_offset_map_lock(mm, pmd, addr, &ptl);
	do {
		[...]
	} while (pte++, addr += PAGE_SIZE, addr != end);
	pte_unmap_unlock(pte - 1, ptl);

The pte starts off pointing at the first element in the page table
directory that was returned by the pte_offset_map_lock().  When it's done
with the page, pte will be pointing to anything between the next entry and
the first entry of the next page inclusive.  By doing a pte - 1, this puts
the pte back onto the original page, which is all that pte_unmap_unlock()
needs.

In most archs (64 bit), this is not an issue as the pte is ignored in the
pte_unmap_unlock().  But on 32 bit archs, where things may be kmapped, it
is essential that the pte passed to pte_unmap_unlock() resides on the same
page that was given by pte_offest_map_lock().

The problem came in d16dfc55 ("mm: mmu_gather rework") where it introduced
a "break;" from the while loop.  This alone did not seem to easily trigger
the bug.  But the modifications made by e303297e6 caused that "break;" to
be hit on the first iteration, before the pte++.

The pte not being incremented will now cause pte_unmap_unlock(pte - 1) to
be pointing to the previous page.  This will cause the wrong page to be
unmapped, and also trigger the warning above.

The simple solution is to just save the pointer given by
pte_offset_map_lock() and use it in the unlock.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:00 -07:00
Christian Gmeiner 4bbd61fb97 drivers/misc/cs5535-mfgpt.c: fix wrong if condition
Fix the wrong `if' condition for the check if the requested timer is
available.

The bitmap avail is used to store if a timer is used already.  test_bit()
is used to check if the requested timer is available.  If a bit in the
avail bitmap is set it means that the timer is available.

The runtime effect would be that allocating a specific timer always fails
(versus telling cs5535_mfgpt_alloc_timer to allocate the first available
timer, which works).

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:00 -07:00
Axel Lin 5a1e6f7583 drivers/misc/spear13xx_pcie_gadget.c: fix a memory leak in spear_pcie_gadget_probe error path
In the case of goto err_kzalloc, we should kfree target.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Pratyush Anand <pratyush.anand@st.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:00 -07:00
KOSAKI Motohiro 32e45ff43e mm: increase RECLAIM_DISTANCE to 30
Recently, Robert Mueller reported (http://lkml.org/lkml/2010/9/12/236)
that zone_reclaim_mode doesn't work properly on his new NUMA server (Dual
Xeon E5520 + Intel S5520UR MB).  He is using Cyrus IMAPd and it's built on
a very traditional single-process model.

  * a master process which reads config files and manages the other
    process
  * multiple imapd processes, one per connection
  * multiple pop3d processes, one per connection
  * multiple lmtpd processes, one per connection
  * periodical "cleanup" processes.

There are thousands of independent processes.  The problem is, recent
Intel motherboard turn on zone_reclaim_mode by default and traditional
prefork model software don't work well on it.  Unfortunatelly, such models
are still typical even in the 21st century.  We can't ignore them.

This patch raises the zone_reclaim_mode threshold to 30.  30 doesn't have
any specific meaning.  but 20 means that one-hop QPI/Hypertransport and
such relatively cheap 2-4 socket machine are often used for traditional
servers as above.  The intention is that these machines don't use
zone_reclaim_mode.

Note: ia64 and Power have arch specific RECLAIM_DISTANCE definitions.
This patch doesn't change such high-end NUMA machine behavior.

Dave Hansen said:

: I know specifically of pieces of x86 hardware that set the information
: in the BIOS to '21' *specifically* so they'll get the zone_reclaim_mode
: behavior which that implies.
:
: They've done performance testing and run very large and scary benchmarks
: to make sure that they _want_ this turned on.  What this means for them
: is that they'll probably be de-optimized, at least on newer versions of
: the kernel.
:
: If you want to do this for particular systems, maybe _that_'s what we
: should do.  Have a list of specific configurations that need the
: defaults overridden either because they're buggy, or they have an
: unusual hardware configuration not really reflected in the distance
: table.

And later said:

: The original change in the hardware tables was for the benefit of a
: benchmark.  Said benchmark isn't going to get run on mainline until the
: next batch of enterprise distros drops, at which point the hardware where
: this was done will be irrelevant for the benchmark.  I'm sure any new
: hardware will just set this distance to another yet arbitrary value to
: make the kernel do what it wants.  :)
:
: Also, when the hardware got _set_ to this initially, I complained.  So, I
: guess I'm getting my way now, with this patch.  I'm cool with it.

Reported-by: Robert Mueller <robm@fastmail.fm>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:03:59 -07:00
Joe Perches 17441227f6 checkpatch: add warning for uses of printk_ratelimit
Warn about uses of printk_ratelimit() because it uses a global state and
can hide subsequent useful messages.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:03:59 -07:00