Commit Graph

336536 Commits

Author SHA1 Message Date
Eric W. Biederman dfc47ef863 net: Push capable(CAP_NET_ADMIN) into the rtnl methods
- In rtnetlink_rcv_msg convert the capable(CAP_NET_ADMIN) check
  to ns_capable(net->user-ns, CAP_NET_ADMIN).  Allowing unprivileged
  users to make netlink calls to modify their local network
  namespace.

- In the rtnetlink doit methods add capable(CAP_NET_ADMIN) so
  that calls that are not safe for unprivileged users are still
  protected.

Later patches will remove the extra capable calls from methods
that are safe for unprivilged users.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:32:44 -05:00
Eric W. Biederman 464dc801c7 net: Don't export sysctls to unprivileged users
In preparation for supporting the creation of network namespaces
by unprivileged users, modify all of the per net sysctl exports
and refuse to allow them to unprivileged users.

This makes it safe for unprivileged users in general to access
per net sysctls, and allows sysctls to be exported to unprivileged
users on an individual basis as they are deemed safe.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:30:55 -05:00
Eric W. Biederman 73f7ef4359 sysctl: Pass useful parameters to sysctl permissions
- Current is implicitly avaiable so passing current->nsproxy isn't useful.
- The ctl_table_header is needed to find how the sysctl table is connected
  to the rest of sysctl.
- ctl_table_root is avaiable in the ctl_table_header so no need to it.

With these changes it becomes possible to write a version of
net_sysctl_permission that takes into account the network namespace of
the sysctl table, an important feature in extending the user namespace.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:30:55 -05:00
Eric W. Biederman d328b83682 userns: make each net (net_ns) belong to a user_ns
The user namespace which creates a new network namespace owns that
namespace and all resources created in it.  This way we can target
capability checks for privileged operations against network resources to
the user_ns which created the network namespace in which the resource
lives.  Privilege to the user namespace which owns the network
namespace, or any parent user namespace thereof, provides the same
privilege to the network resource.

This patch is reworked from a version originally by
Serge E. Hallyn <serge.hallyn@canonical.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:30:55 -05:00
Eric W. Biederman 2407dc25f7 netns: Deduplicate and fix copy_net_ns when !CONFIG_NET_NS
The copy of copy_net_ns used when the network stack is not
built is broken as it does not return -EINVAL when attempting
to create a new network namespace.  We don't even have
a previous network namespace.

Since we need a copy of copy_net_ns in net/net_namespace.h that is
available when the networking stack is not built at all move the
correct version of copy_net_ns from net_namespace.c into net_namespace.h
Leaving us with just 2 versions of copy_net_ns.  One version for when
we compile in network namespace suport and another stub for all other
occasions.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:30:55 -05:00
Vlad Yasevich 75fe83c322 ipv6: Preserve ipv6 functionality needed by NET
Some pieces of network use core pieces of IPv6 stack.  Keep
them available while letting new GSO offload pieces depend
on CONFIG_INET.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 02:34:00 -05:00
Sony Chacko 6d973cb163 qlcnic: fix sparse warnings
qlcnic_hw.c:1337:17: warning: cast removes address space of expression
qlcnic_hw.c:1337:17: warning: incorrect type in argument 2 (different address spaces)
qlcnic_hw.c:1337:17:    expected void volatile [noderef] <asn:2>*addr
qlcnic_hw.c:1337:17:    got void *<noident>
qlcnic_hw.c:1337:17: warning: cast removes address space of expression
qlcnic_hw.c:1337:17: warning: incorrect type in argument 1 (different address spaces)
qlcnic_hw.c:1337:17:    expected void const volatile [noderef] <asn:2>*addr
qlcnic_hw.c:1337:17:    got void *<noident>

The above warnings are originating from the macros QLCNIC_RD_DUMP_REG and
QLCNIC_WR_DUMP_REG.
The warnings are fixed and macros are replaced with equivalent functions
in the only file from where it is called.

The following warnings are fixed by making the functions static.

qlcnic_hw.c:543:5: warning: symbol 'qlcnic_set_fw_loopback' was not declared. Should it be static?
qlcnic_init.c:1853:6: warning: symbol 'qlcnic_process_rcv_diag' was not declared. Should it be static?

Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 02:32:16 -05:00
Sony Chacko 5ad6ff9d85 qlcnic: fix compiler warnings
Fix the following warnings:

qlcnic_main.c: In function 'qlcnic_update_cmd_producer':
qlcnic_main.c:119:51: warning: unused parameter 'adapter' [-Wunused-parameter]
qlcnic_main.c:119: warning: unused parameter adapter
qlcnic_init.c: In function qlcnic_process_lro
qlcnic_init.c:1586: warning: unused parameter sds_ring
qlcnic_init.c: In function qlcnic_process_rcv_diag
qlcnic_init.c:1854: warning: unused parameter sds_ring
qlcnic_init.c: In function qlcnic_fetch_mac
qlcnic_init.c:1938: warning: unused parameter adapter

warning: 'pci_using_dac' may be used uninitialized in this function [-Wmaybe-uninitialized]
qlcnic_main.c:1569:10: note: 'pci_using_dac' was declared here

Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 02:32:16 -05:00
Daniel Borkmann 5082dfb716 PPC: net: bpf_jit_comp: add VLAN instructions for BPF JIT
This patch is a follow-up for patch "net: filter: add vlan tag access"
to support the new VLAN_TAG/VLAN_TAG_PRESENT accessors in BPF JIT.

Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Cc: Matt Evans <matt@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-17 22:12:47 -05:00
Daniel Borkmann 02871903a1 PPC: net: bpf_jit_comp: add XOR instruction for BPF JIT
This patch is a follow-up for patch "filter: add XOR instruction for use
with X/K" that implements BPF PowerPC JIT parts for the BPF XOR operation.

Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Cc: Matt Evans <matt@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-17 22:12:47 -05:00
Ben Hutchings 18d359ceb0 pch_gbe, ptp_pch: Fix the dependency direction between these drivers
In commit a24006ed12 ('ptp: Enable clock
drivers along with associated net/PHY drivers') I wrongly made
PTP_1588_CLOCK_PCH depend on PCH_GBE.  The dependency is really the
other way around.  Therefore make PCH_GBE select PTP_1588_CLOCK_PCH
and remove the 'default y' from the latter.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-17 22:11:45 -05:00
Rami Rosen e8e55d9514 vxlan: remove unused variable.
This patch removes addrexceeded member from vxlan_dev struct as it is unused.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-17 22:02:19 -05:00
Akinobu Mita fc184f0892 sctp: use bitmap_weight
Use bitmap_weight to count the total number of bits set in bitmap.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-17 22:01:18 -05:00
David S. Miller 67f4efdce7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor line offset auto-merges.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-17 22:00:43 -05:00
Linus Torvalds f4a75d2eb7 Linux 3.7-rc6 2012-11-16 17:42:40 -08:00
Linus Torvalds 51844b0f04 Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fix from Marcelo Tosatti:
 "A correction for oops on module init with older Intel hosts."

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Fix invalid secondary exec controls in vmx_cpuid_update()
2012-11-16 16:49:10 -08:00
Linus Torvalds 0cad3ff404 Merge branch 'akpm' (Fixes from Andrew)
Merge misc fixes from Andrew Morton.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (12 patches)
  revert "mm: fix-up zone present pages"
  tmpfs: change final i_blocks BUG to WARNING
  tmpfs: fix shmem_getpage_gfp() VM_BUG_ON
  mm: highmem: don't treat PKMAP_ADDR(LAST_PKMAP) as a highmem address
  mm: revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures"
  rapidio: fix kernel-doc warnings
  swapfile: fix name leak in swapoff
  memcg: fix hotplugged memory zone oops
  mips, arc: fix build failure
  memcg: oom: fix totalpages calculation for memory.swappiness==0
  mm: fix build warning for uninitialized value
  mm: add anon_vma_lock to validate_mm()
2012-11-16 15:26:38 -08:00
Andrew Morton 5576646f3c revert "mm: fix-up zone present pages"
Revert commit 7f1290f2f2 ("mm: fix-up zone present pages")

That patch tried to fix a issue when calculating zone->present_pages,
but it caused a regression on 32bit systems with HIGHMEM.  With that
change, reset_zone_present_pages() resets all zone->present_pages to
zero, and fixup_zone_present_pages() is called to recalculate
zone->present_pages when the boot allocator frees core memory pages into
buddy allocator.  Because highmem pages are not freed by bootmem
allocator, all highmem zones' present_pages becomes zero.

Various options for improving the situation are being discussed but for
now, let's return to the 3.6 code.

Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-16 14:33:04 -08:00
Hugh Dickins 0f3c42f522 tmpfs: change final i_blocks BUG to WARNING
Under a particular load on one machine, I have hit shmem_evict_inode()'s
BUG_ON(inode->i_blocks), enough times to narrow it down to a particular
race between swapout and eviction.

It comes from the "if (freed > 0)" asymmetry in shmem_recalc_inode(),
and the lack of coherent locking between mapping's nrpages and shmem's
swapped count.  There's a window in shmem_writepage(), between lowering
nrpages in shmem_delete_from_page_cache() and then raising swapped
count, when the freed count appears to be +1 when it should be 0, and
then the asymmetry stops it from being corrected with -1 before hitting
the BUG.

One answer is coherent locking: using tree_lock throughout, without
info->lock; reasonable, but the raw_spin_lock in percpu_counter_add() on
used_blocks makes that messier than expected.  Another answer may be a
further effort to eliminate the weird shmem_recalc_inode() altogether,
but previous attempts at that failed.

So far undecided, but for now change the BUG_ON to WARN_ON: in usual
circumstances it remains a useful consistency check.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-16 14:33:04 -08:00
Hugh Dickins 215c02bc33 tmpfs: fix shmem_getpage_gfp() VM_BUG_ON
Fuzzing with trinity hit the "impossible" VM_BUG_ON(error) (which Fedora
has converted to WARNING) in shmem_getpage_gfp():

  WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
  Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
  Call Trace:
    warn_slowpath_common+0x7f/0xc0
    warn_slowpath_null+0x1a/0x20
    shmem_getpage_gfp+0xa5c/0xa70
    shmem_fault+0x4f/0xa0
    __do_fault+0x71/0x5c0
    handle_pte_fault+0x97/0xae0
    handle_mm_fault+0x289/0x350
    __do_page_fault+0x18e/0x530
    do_page_fault+0x2b/0x50
    page_fault+0x28/0x30
    tracesys+0xe1/0xe6

Thanks to Johannes for pointing to truncation: free_swap_and_cache()
only does a trylock on the page, so the page lock we've held since
before confirming swap is not enough to protect against truncation.

What cleanup is needed in this case? Just delete_from_swap_cache(),
which takes care of the memcg uncharge.

Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Dave Jones <davej@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-16 14:33:04 -08:00
Will Deacon 498c228021 mm: highmem: don't treat PKMAP_ADDR(LAST_PKMAP) as a highmem address
kmap_to_page returns the corresponding struct page for a virtual address
of an arbitrary mapping.  This works by checking whether the address
falls in the pkmap region and using the pkmap page tables instead of the
linear mapping if appropriate.

Unfortunately, the bounds checking means that PKMAP_ADDR(LAST_PKMAP) is
incorrectly treated as a highmem address and we can end up walking off
the end of pkmap_page_table and subsequently passing junk to pte_page.

This patch fixes the bound check to stay within the pkmap tables.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-16 14:33:04 -08:00
Mel Gorman 96710098ee mm: revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures"
Jiri Slaby reported the following:

	(It's an effective revert of "mm: vmscan: scale number of pages
	reclaimed by reclaim/compaction based on failures".) Given kswapd
	had hours of runtime in ps/top output yesterday in the morning
	and after the revert it's now 2 minutes in sum for the last 24h,
	I would say, it's gone.

The intention of the patch in question was to compensate for the loss of
lumpy reclaim.  Part of the reason lumpy reclaim worked is because it
aggressively reclaimed pages and this patch was meant to be a sane
compromise.

When compaction fails, it gets deferred and both compaction and
reclaim/compaction is deferred avoid excessive reclaim.  However, since
commit c654345924 ("mm: remove __GFP_NO_KSWAPD"), kswapd is woken up
each time and continues reclaiming which was not taken into account when
the patch was developed.

Attempts to address the problem ended up just changing the shape of the
problem instead of fixing it.  The release window gets closer and while
a THP allocation failing is not a major problem, kswapd chewing up a lot
of CPU is.

This patch reverts commit 83fde0f228 ("mm: vmscan: scale number of
pages reclaimed by reclaim/compaction based on failures") and will be
revisited in the future.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-16 14:33:04 -08:00
Randy Dunlap 2ca3cb50ed rapidio: fix kernel-doc warnings
Fix rapidio kernel-doc warnings:

  Warning(drivers/rapidio/rio.c:415): No description found for parameter 'local'
  Warning(drivers/rapidio/rio.c:415): Excess function parameter 'lstart' description in 'rio_map_inb_region'
  Warning(include/linux/rio.h:290): No description found for parameter 'switches'
  Warning(include/linux/rio.h:290): No description found for parameter 'destid_table'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Acked-by: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-16 14:33:04 -08:00
Xiaotian Feng f58b59c1df swapfile: fix name leak in swapoff
There's a name leak introduced by commit 91a27b2a75 ("vfs: define
struct filename and have getname() return it").  Add the missing
putname.

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-16 14:33:04 -08:00
Hugh Dickins bea8c150a7 memcg: fix hotplugged memory zone oops
When MEMCG is configured on (even when it's disabled by boot option),
when adding or removing a page to/from its lru list, the zone pointer
used for stats updates is nowadays taken from the struct lruvec.  (On
many configurations, calculating zone from page is slower.)

But we have no code to update all the lruvecs (per zone, per memcg) when
a memory node is hotadded.  Here's an extract from the oops which
results when running numactl to bind a program to a newly onlined node:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000f60
  IP:  __mod_zone_page_state+0x9/0x60
  Pid: 1219, comm: numactl Not tainted 3.6.0-rc5+ #180 Bochs Bochs
  Process numactl (pid: 1219, threadinfo ffff880039abc000, task ffff8800383c4ce0)
  Call Trace:
    __pagevec_lru_add_fn+0xdf/0x140
    pagevec_lru_move_fn+0xb1/0x100
    __pagevec_lru_add+0x1c/0x30
    lru_add_drain_cpu+0xa3/0x130
    lru_add_drain+0x2f/0x40
   ...

The natural solution might be to use a memcg callback whenever memory is
hotadded; but that solution has not been scoped out, and it happens that
we do have an easy location at which to update lruvec->zone.  The lruvec
pointer is discovered either by mem_cgroup_zone_lruvec() or by
mem_cgroup_page_lruvec(), and both of those do know the right zone.

So check and set lruvec->zone in those; and remove the inadequate
attempt to set lruvec->zone from lruvec_init(), which is called before
NODE_DATA(node) has been allocated in such cases.

Ah, there was one exceptionr.  For no particularly good reason,
mem_cgroup_force_empty_list() has its own code for deciding lruvec.
Change it to use the standard mem_cgroup_zone_lruvec() and
mem_cgroup_get_lru_size() too.  In fact it was already safe against such
an oops (the lru lists in danger could only be empty), but we're better
proofed against future changes this way.

I've marked this for stable (3.6) since we introduced the problem in 3.5
(now closed to stable); but I have no idea if this is the only fix
needed to get memory hotadd working with memcg in 3.6, and received no
answer when I enquired twice before.

Reported-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-16 14:33:04 -08:00