Adds a new return value to seccomp filters that triggers a SIGSYS to be
delivered with the new SYS_SECCOMP si_code.
This allows in-process system call emulation, including just specifying
an errno or cleanly dumping core, rather than just dying.
Suggested-by: Markus Gutschke <markus@chromium.org>
Suggested-by: Julien Tinnes <jln@chromium.org>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>
v18: - acked-by, rebase
- don't mention secure_computing_int() anymore
v15: - use audit_seccomp/skip
- pad out error spacing; clean up switch (indan@nul.nu)
v14: - n/a
v13: - rebase on to 88ebdda615
v12: - rebase on to linux-next
v11: - clarify the comment (indan@nul.nu)
- s/sigtrap/sigsys
v10: - use SIGSYS, syscall_get_arch, updates arch/Kconfig
note suggested-by (though original suggestion had other behaviors)
v9: - changes to SIGILL
v8: - clean up based on changes to dependent patches
v7: - introduction
Signed-off-by: James Morris <james.l.morris@oracle.com>
This change enables SIGSYS, defines _sigfields._sigsys, and adds
x86 (compat) arch support. _sigsys defines fields which allow
a signal handler to receive the triggering system call number,
the relevant AUDIT_ARCH_* value for that number, and the address
of the callsite.
SIGSYS is added to the SYNCHRONOUS_MASK because it is desirable for it
to have setup_frame() called for it. The goal is to ensure that
ucontext_t reflects the machine state from the time-of-syscall and not
from another signal handler.
The first consumer of SIGSYS would be seccomp filter. In particular,
a filter program could specify a new return value, SECCOMP_RET_TRAP,
which would result in the system call being denied and the calling
thread signaled. This also means that implementing arch-specific
support can be dependent upon HAVE_ARCH_SECCOMP_FILTER.
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Eric Paris <eparis@redhat.com>
v18: - added acked by, rebase
v17: - rebase and reviewed-by addition
v14: - rebase/nochanges
v13: - rebase on to 88ebdda615
v12: - reworded changelog (oleg@redhat.com)
v11: - fix dropped words in the change description
- added fallback copy_siginfo support.
- added __ARCH_SIGSYS define to allow stepped arch support.
v10: - first version based on suggestion
Signed-off-by: James Morris <james.l.morris@oracle.com>
This change adds the SECCOMP_RET_ERRNO as a valid return value from a
seccomp filter. Additionally, it makes the first use of the lower
16-bits for storing a filter-supplied errno. 16-bits is more than
enough for the errno-base.h calls.
Returning errors instead of immediately terminating processes that
violate seccomp policy allow for broader use of this functionality
for kernel attack surface reduction. For example, a linux container
could maintain a whitelist of pre-existing system calls but drop
all new ones with errnos. This would keep a logically static attack
surface while providing errnos that may allow for graceful failure
without the downside of do_exit() on a bad call.
This change also changes the signature of __secure_computing. It
appears the only direct caller is the arm entry code and it clobbers
any possible return value (register) immediately.
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>
v18: - fix up comments and rebase
- fix bad var name which was fixed in later revs
- remove _int() and just change the __secure_computing signature
v16-v17: ...
v15: - use audit_seccomp and add a skip label. (eparis@redhat.com)
- clean up and pad out return codes (indan@nul.nu)
v14: - no change/rebase
v13: - rebase on to 88ebdda615
v12: - move to WARN_ON if filter is NULL
(oleg@redhat.com, luto@mit.edu, keescook@chromium.org)
- return immediately for filter==NULL (keescook@chromium.org)
- change evaluation to only compare the ACTION so that layered
errnos don't result in the lowest one being returned.
(keeschook@chromium.org)
v11: - check for NULL filter (keescook@chromium.org)
v10: - change loaders to fn
v9: - n/a
v8: - update Kconfig to note new need for syscall_set_return_value.
- reordered such that TRAP behavior follows on later.
- made the for loop a little less indent-y
v7: - introduced
Signed-off-by: James Morris <james.l.morris@oracle.com>
[This patch depends on luto@mit.edu's no_new_privs patch:
https://lkml.org/lkml/2012/1/30/264
The whole series including Andrew's patches can be found here:
https://github.com/redpig/linux/tree/seccomp
Complete diff here:
https://github.com/redpig/linux/compare/1dc65fed...seccomp
]
This patch adds support for seccomp mode 2. Mode 2 introduces the
ability for unprivileged processes to install system call filtering
policy expressed in terms of a Berkeley Packet Filter (BPF) program.
This program will be evaluated in the kernel for each system call
the task makes and computes a result based on data in the format
of struct seccomp_data.
A filter program may be installed by calling:
struct sock_fprog fprog = { ... };
...
prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &fprog);
The return value of the filter program determines if the system call is
allowed to proceed or denied. If the first filter program installed
allows prctl(2) calls, then the above call may be made repeatedly
by a task to further reduce its access to the kernel. All attached
programs must be evaluated before a system call will be allowed to
proceed.
Filter programs will be inherited across fork/clone and execve.
However, if the task attaching the filter is unprivileged
(!CAP_SYS_ADMIN) the no_new_privs bit will be set on the task. This
ensures that unprivileged tasks cannot attach filters that affect
privileged tasks (e.g., setuid binary).
There are a number of benefits to this approach. A few of which are
as follows:
- BPF has been exposed to userland for a long time
- BPF optimization (and JIT'ing) are well understood
- Userland already knows its ABI: system call numbers and desired
arguments
- No time-of-check-time-of-use vulnerable data accesses are possible.
- system call arguments are loaded on access only to minimize copying
required for system call policy decisions.
Mode 2 support is restricted to architectures that enable
HAVE_ARCH_SECCOMP_FILTER. In this patch, the primary dependency is on
syscall_get_arguments(). The full desired scope of this feature will
add a few minor additional requirements expressed later in this series.
Based on discussion, SECCOMP_RET_ERRNO and SECCOMP_RET_TRACE seem to be
the desired additional functionality.
No architectures are enabled in this patch.
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: Indan Zupancic <indan@nul.nu>
Acked-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
v18: - rebase to v3.4-rc2
- s/chk/check/ (akpm@linux-foundation.org,jmorris@namei.org)
- allocate with GFP_KERNEL|__GFP_NOWARN (indan@nul.nu)
- add a comment for get_u32 regarding endianness (akpm@)
- fix other typos, style mistakes (akpm@)
- added acked-by
v17: - properly guard seccomp filter needed headers (leann@ubuntu.com)
- tighten return mask to 0x7fff0000
v16: - no change
v15: - add a 4 instr penalty when counting a path to account for seccomp_filter
size (indan@nul.nu)
- drop the max insns to 256KB (indan@nul.nu)
- return ENOMEM if the max insns limit has been hit (indan@nul.nu)
- move IP checks after args (indan@nul.nu)
- drop !user_filter check (indan@nul.nu)
- only allow explicit bpf codes (indan@nul.nu)
- exit_code -> exit_sig
v14: - put/get_seccomp_filter takes struct task_struct
(indan@nul.nu,keescook@chromium.org)
- adds seccomp_chk_filter and drops general bpf_run/chk_filter user
- add seccomp_bpf_load for use by net/core/filter.c
- lower max per-process/per-hierarchy: 1MB
- moved nnp/capability check prior to allocation
(all of the above: indan@nul.nu)
v13: - rebase on to 88ebdda615
v12: - added a maximum instruction count per path (indan@nul.nu,oleg@redhat.com)
- removed copy_seccomp (keescook@chromium.org,indan@nul.nu)
- reworded the prctl_set_seccomp comment (indan@nul.nu)
v11: - reorder struct seccomp_data to allow future args expansion (hpa@zytor.com)
- style clean up, @compat dropped, compat_sock_fprog32 (indan@nul.nu)
- do_exit(SIGSYS) (keescook@chromium.org, luto@mit.edu)
- pare down Kconfig doc reference.
- extra comment clean up
v10: - seccomp_data has changed again to be more aesthetically pleasing
(hpa@zytor.com)
- calling convention is noted in a new u32 field using syscall_get_arch.
This allows for cross-calling convention tasks to use seccomp filters.
(hpa@zytor.com)
- lots of clean up (thanks, Indan!)
v9: - n/a
v8: - use bpf_chk_filter, bpf_run_filter. update load_fns
- Lots of fixes courtesy of indan@nul.nu:
-- fix up load behavior, compat fixups, and merge alloc code,
-- renamed pc and dropped __packed, use bool compat.
-- Added a hidden CONFIG_SECCOMP_FILTER to synthesize non-arch
dependencies
v7: (massive overhaul thanks to Indan, others)
- added CONFIG_HAVE_ARCH_SECCOMP_FILTER
- merged into seccomp.c
- minimal seccomp_filter.h
- no config option (part of seccomp)
- no new prctl
- doesn't break seccomp on systems without asm/syscall.h
(works but arg access always fails)
- dropped seccomp_init_task, extra free functions, ...
- dropped the no-asm/syscall.h code paths
- merges with network sk_run_filter and sk_chk_filter
v6: - fix memory leak on attach compat check failure
- require no_new_privs || CAP_SYS_ADMIN prior to filter
installation. (luto@mit.edu)
- s/seccomp_struct_/seccomp_/ for macros/functions (amwang@redhat.com)
- cleaned up Kconfig (amwang@redhat.com)
- on block, note if the call was compat (so the # means something)
v5: - uses syscall_get_arguments
(indan@nul.nu,oleg@redhat.com, mcgrathr@chromium.org)
- uses union-based arg storage with hi/lo struct to
handle endianness. Compromises between the two alternate
proposals to minimize extra arg shuffling and account for
endianness assuming userspace uses offsetof().
(mcgrathr@chromium.org, indan@nul.nu)
- update Kconfig description
- add include/seccomp_filter.h and add its installation
- (naive) on-demand syscall argument loading
- drop seccomp_t (eparis@redhat.com)
v4: - adjusted prctl to make room for PR_[SG]ET_NO_NEW_PRIVS
- now uses current->no_new_privs
(luto@mit.edu,torvalds@linux-foundation.com)
- assign names to seccomp modes (rdunlap@xenotime.net)
- fix style issues (rdunlap@xenotime.net)
- reworded Kconfig entry (rdunlap@xenotime.net)
v3: - macros to inline (oleg@redhat.com)
- init_task behavior fixed (oleg@redhat.com)
- drop creator entry and extra NULL check (oleg@redhat.com)
- alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com)
- adds tentative use of "always_unprivileged" as per
torvalds@linux-foundation.org and luto@mit.edu
v2: - (patch 2 only)
Signed-off-by: James Morris <james.l.morris@oracle.com>
Adds a stub for a function that will return the AUDIT_ARCH_* value
appropriate to the supplied task based on the system call convention.
For audit's use, the value can generally be hard-coded at the
audit-site. However, for other functionality not inlined into syscall
entry/exit, this makes that information available. seccomp_filter is
the first planned consumer and, as such, the comment indicates a tie to
CONFIG_HAVE_ARCH_SECCOMP_FILTER.
Suggested-by: Roland McGrath <mcgrathr@chromium.org>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Eric Paris <eparis@redhat.com>
v18: comment and change reword and rebase.
v14: rebase/nochanges
v13: rebase on to 88ebdda615
v12: rebase on to linux-next
v11: fixed improper return type
v10: introduced
Signed-off-by: James Morris <james.l.morris@oracle.com>
Replaces the seccomp_t typedef with struct seccomp to match modern
kernel style.
Signed-off-by: Will Drewry <wad@chromium.org>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Eric Paris <eparis@redhat.com>
v18: rebase
...
v14: rebase/nochanges
v13: rebase on to 88ebdda615
v12: rebase on to linux-next
v8-v11: no changes
v7: struct seccomp_struct -> struct seccomp
v6: original inclusion in this series.
Signed-off-by: James Morris <james.l.morris@oracle.com>
Any other users of bpf_*_filter that take a struct sock_fprog from
userspace will need to be able to also accept a compat_sock_fprog
if the arch supports compat calls. This change allows the existing
compat_sock_fprog be shared.
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Paris <eparis@redhat.com>
v18: tasered by the apostrophe police
v14: rebase/nochanges
v13: rebase on to 88ebdda615
v12: rebase on to linux-next
v11: introduction
Signed-off-by: James Morris <james.l.morris@oracle.com>
Introduces a new BPF ancillary instruction that all LD calls will be
mapped through when skb_run_filter() is being used for seccomp BPF. The
rewriting will be done using a secondary chk_filter function that is run
after skb_chk_filter.
The code change is guarded by CONFIG_SECCOMP_FILTER which is added,
along with the seccomp_bpf_load() function later in this series.
This is based on http://lkml.org/lkml/2012/3/2/141
Suggested-by: Indan Zupancic <indan@nul.nu>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Paris <eparis@redhat.com>
v18: rebase
...
v15: include seccomp.h explicitly for when seccomp_bpf_load exists.
v14: First cut using a single additional instruction
... v13: made bpf functions generic.
Signed-off-by: James Morris <james.l.morris@oracle.com>
With this change, calling
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)
disables privilege granting operations at execve-time. For example, a
process will not be able to execute a setuid binary to change their uid
or gid if this bit is set. The same is true for file capabilities.
Additionally, LSM_UNSAFE_NO_NEW_PRIVS is defined to ensure that
LSMs respect the requested behavior.
To determine if the NO_NEW_PRIVS bit is set, a task may call
prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0);
It returns 1 if set and 0 if it is not set. If any of the arguments are
non-zero, it will return -1 and set errno to -EINVAL.
(PR_SET_NO_NEW_PRIVS behaves similarly.)
This functionality is desired for the proposed seccomp filter patch
series. By using PR_SET_NO_NEW_PRIVS, it allows a task to modify the
system call behavior for itself and its child tasks without being
able to impact the behavior of a more privileged task.
Another potential use is making certain privileged operations
unprivileged. For example, chroot may be considered "safe" if it cannot
affect privileged tasks.
Note, this patch causes execve to fail when PR_SET_NO_NEW_PRIVS is
set and AppArmor is in use. It is fixed in a subsequent patch.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
v18: updated change desc
v17: using new define values as per 3.4
Signed-off-by: James Morris <james.l.morris@oracle.com>
Pull networking updates from David Miller:
1) Fix inaccuracies in network driver interface documentation, from Ben
Hutchings.
2) Fix handling of negative offsets in BPF JITs, from Jan Seiffert.
3) Compile warning, locking, and refcounting fixes in netfilter's
xt_CT, from Pablo Neira Ayuso.
4) phonet sendmsg needs to validate user length just like any other
datagram protocol, fix from Sasha Levin.
5) Ipv6 multicast code uses wrong loop index, from RongQing Li.
6) Link handling and firmware fixes in bnx2x driver from Yaniv Rosner
and Yuval Mintz.
7) mlx4 erroneously allocates 4 pages at a time, regardless of page
size, fix from Thadeu Lima de Souza Cascardo.
8) SCTP socket option wasn't extended in a backwards compatible way,
fix from Thomas Graf.
9) Add missing address change event emissions to bonding, from Shlomo
Pongratz.
10) /proc/net/dev regressed because it uses a private offset to track
where we are in the hash table, but this doesn't track the offset
pullback that the seq_file code does resulting in some entries being
missed in large dumps.
Fix from Eric Dumazet.
11) do_tcp_sendpage() unloads the send queue way too fast, because it
invokes tcp_push() when it shouldn't. Let the natural sequence
generated by the splice paths, and the assosciated MSG_MORE
settings, guide the tcp_push() calls.
Otherwise what goes out of TCP is spaghetti and doesn't batch
effectively into GSO/TSO clusters.
From Eric Dumazet.
12) Once we put a SKB into either the netlink receiver's queue or a
socket error queue, it can be consumed and freed up, therefore we
cannot touch it after queueing it like that.
Fixes from Eric Dumazet.
13) PPP has this annoying behavior in that for every transmit call it
immediately stops the TX queue, then calls down into the next layer
to transmit the PPP frame.
But if that next layer can take it immediately, it just un-stops the
TX queue right before returning from the transmit method.
Besides being useless work, it makes several facilities unusable, in
particular things like the equalizers. Well behaved devices should
only stop the TX queue when they really are full, and in PPP's case
when it gets backlogged to the downstream device.
David Woodhouse therefore fixed PPP to not stop the TX queue until
it's downstream can't take data any more.
14) IFF_UNICAST_FLT got accidently lost in some recent stmmac driver
changes, re-add. From Marc Kleine-Budde.
15) Fix link flaps in ixgbe, from Eric W. Multanen.
16) Descriptor writeback fixes in e1000e from Matthew Vick.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
net: fix a race in sock_queue_err_skb()
netlink: fix races after skb queueing
doc, net: Update ndo_start_xmit return type and values
doc, net: Remove instruction to set net_device::trans_start
doc, net: Update netdev operation names
doc, net: Update documentation of synchronisation for TX multiqueue
doc, net: Remove obsolete reference to dev->poll
ethtool: Remove exception to the requirement of holding RTNL lock
MAINTAINERS: update for Marvell Ethernet drivers
bonding: properly unset current_arp_slave on slave link up
phonet: Check input from user before allocating
tcp: tcp_sendpages() should call tcp_push() once
ipv6: fix array index in ip6_mc_add_src()
mlx4: allocate just enough pages instead of always 4 pages
stmmac: re-add IFF_UNICAST_FLT for dwmac1000
bnx2x: Clear MDC/MDIO warning message
bnx2x: Fix BCM57711+BCM84823 link issue
bnx2x: Clear BCM84833 LED after fan failure
bnx2x: Fix BCM84833 PHY FW version presentation
bnx2x: Fix link issue for BCM8727 boards.
...
Commit e52ac3398c ('net: Use device
model to get driver name in skb_gso_segment()') removed the only
in-tree caller of ethtool ops that doesn't hold the RTNL lock.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 2f53384424 (tcp: allow splice() to build full TSO packets) added
a regression for splice() calls using SPLICE_F_MORE.
We need to call tcp_flush() at the end of the last page processed in
tcp_sendpages(), or else transmits can be deferred and future sends
stall.
Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but
with different semantic.
For all sendpage() providers, its a transparent change. Only
sock_sendpage() and tcp_sendpages() can differentiate the two different
flags provided by pipe_to_sendpage()
Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge batch of fixes from Andrew Morton:
"The simple_open() cleanup was held back while I wanted for laggards to
merge things.
I still need to send a few checkpoint/restore patches. I've been
wobbly about merging them because I'm wobbly about the overall
prospects for success of the project. But after speaking with Pavel
at the LSF conference, it sounds like they're further toward
completion than I feared - apparently davem is at the "has stopped
complaining" stage regarding the net changes. So I need to go back
and re-review those patchs and their (lengthy) discussion."
* emailed from Andrew Morton <akpm@linux-foundation.org>: (16 patches)
memcg swap: use mem_cgroup_uncharge_swap fix
backlight: add driver for DA9052/53 PMIC v1
C6X: use set_current_blocked() and block_sigmask()
MAINTAINERS: add entry for sparse checker
MAINTAINERS: fix REMOTEPROC F: typo
alpha: use set_current_blocked() and block_sigmask()
simple_open: automatically convert to simple_open()
scripts/coccinelle/api/simple_open.cocci: semantic patch for simple_open()
libfs: add simple_open()
hugetlbfs: remove unregister_filesystem() when initializing module
drivers/rtc/rtc-88pm860x.c: fix rtc irq enable callback
fs/xattr.c:setxattr(): improve handling of allocation failures
fs/xattr.c:listxattr(): fall back to vmalloc() if kmalloc() failed
fs/xattr.c: suppress page allocation failure warnings from sys_listxattr()
sysrq: use SEND_SIG_FORCED instead of force_sig()
proc: fix mount -t proc -o AAA
Although mem_cgroup_uncharge_swap has an empty placeholder for
!CONFIG_CGROUP_MEM_RES_CTLR_SWAP the definition is placed in the
CONFIG_SWAP ifdef block so we are missing the same definition for
!CONFIG_SWAP which implies !CONFIG_CGROUP_MEM_RES_CTLR_SWAP.
This has not been an issue before, because mem_cgroup_uncharge_swap was
not called from !CONFIG_SWAP context. But Hugh Dickins has a cleanup
patch to call __mem_cgroup_commit_charge_swapin which is defined also
for !CONFIG_SWAP.
Let's move both the empty definition and declaration outside of the
CONFIG_SWAP block to avoid the following compilation error:
mm/memcontrol.c: In function '__mem_cgroup_commit_charge_swapin':
mm/memcontrol.c:2837: error: implicit declaration of function 'mem_cgroup_uncharge_swap'
if CONFIG_SWAP is disabled.
Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
debugfs and a few other drivers use an open-coded version of
simple_open() to pass a pointer from the file to the read/write file
ops. Add support for this simple case to libfs so that we can remove
the many duplicate copies of this simple function.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull CIFS fixes from Steve French.
* git://git.samba.org/sfrench/cifs-2.6:
Fix UNC parsing on mount
Remove unnecessary check for NULL in password parser
CIFS: Fix VFS lock usage for oplocked files
Revert "CIFS: Fix VFS lock usage for oplocked files"
cifs: writing past end of struct in cifs_convert_address()
cifs: silence compiler warnings showing up with gcc-4.7.0
CIFS: Fix VFS lock usage for oplocked files
Pull KGDB/KDB regression fixes from Jason Wessel:
- Fix a Smatch warning that appeared in the 3.4 merge window
- Fix kgdb test suite with SMP for all archs without HW single stepping
- Fix kgdb sw breakpoints with CONFIG_DEBUG_RODATA=y limitations on x86
- Fix oops on kgdb test suite with CONFIG_DEBUG_RODATA
- Fix kgdb test suite with SMP for all archs with HW single stepping
* tag 'for_linus-3.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
x86,kgdb: Fix DEBUG_RODATA limitation using text_poke()
kgdb,debug_core: pass the breakpoint struct instead of address and memory
kgdbts: (2 of 2) fix single step awareness to work correctly with SMP
kgdbts: (1 of 2) fix single step awareness to work correctly with SMP
kgdbts: Fix kernel oops with CONFIG_DEBUG_RODATA
kdb: Fix smatch warning on dbg_io_ops->is_console
Pull DMA mapping branch from Marek Szyprowski:
"Short summary for the whole series:
A few limitations have been identified in the current dma-mapping
design and its implementations for various architectures. There exist
more than one function for allocating and freeing the buffers:
currently these 3 are used dma_{alloc, free}_coherent,
dma_{alloc,free}_writecombine, dma_{alloc,free}_noncoherent.
For most of the systems these calls are almost equivalent and can be
interchanged. For others, especially the truly non-coherent ones
(like ARM), the difference can be easily noticed in overall driver
performance. Sadly not all architectures provide implementations for
all of them, so the drivers might need to be adapted and cannot be
easily shared between different architectures. The provided patches
unify all these functions and hide the differences under the already
existing dma attributes concept. The thread with more references is
available here:
http://www.spinics.net/lists/linux-sh/msg09777.html
These patches are also a prerequisite for unifying DMA-mapping
implementation on ARM architecture with the common one provided by
dma_map_ops structure and extending it with IOMMU support. More
information is available in the following thread:
http://thread.gmane.org/gmane.linux.kernel.cross-arch/12819
More works on dma-mapping framework are planned, especially in the
area of buffer sharing and managing the shared mappings (together with
the recently introduced dma_buf interface: commit d15bd7ee44
"dma-buf: Introduce dma buffer sharing mechanism").
The patches in the current set introduce a new alloc/free methods
(with support for memory attributes) in dma_map_ops structure, which
will later replace dma_alloc_coherent and dma_alloc_writecombine
functions."
People finally started piping up with support for merging this, so I'm
merging it as the last of the pending stuff from the merge window.
Looks like pohmelfs is going to wait for 3.5 and more external support
for merging.
* 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
common: DMA-mapping: add NON-CONSISTENT attribute
common: DMA-mapping: add WRITE_COMBINE attribute
common: dma-mapping: introduce mmap method
common: dma-mapping: remove old alloc_coherent and free_coherent methods
Hexagon: adapt for dma_map_ops changes
Unicore32: adapt for dma_map_ops changes
Microblaze: adapt for dma_map_ops changes
SH: adapt for dma_map_ops changes
Alpha: adapt for dma_map_ops changes
SPARC: adapt for dma_map_ops changes
PowerPC: adapt for dma_map_ops changes
MIPS: adapt for dma_map_ops changes
X86 & IA64: adapt for dma_map_ops changes
common: dma-mapping: introduce generic alloc() and free() methods
Pull more power management updates from Rafael Wysocki:
- Patch series that hopefully fixes races between the freezer and
request_firmware() and request_firmware_nowait() for good, with two
cleanups from Stephen Boyd on top.
- Runtime PM fix from Alan Stern preventing tasks from getting stuck
indefinitely in the runtime PM wait queue.
- Device PM QoS update from MyungJoo Ham introducing a new variant of
pm_qos_update_request() allowing the callers to specify a timeout.
* tag 'pm-for-3.4-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / QoS: add pm_qos_update_request_timeout() API
firmware_class: Move request_firmware_nowait() to workqueues
firmware_class: Reorganize fw_create_instance()
PM / Sleep: Mitigate race between the freezer and request_firmware()
PM / Sleep: Move disabling of usermode helpers to the freezer
PM / Hibernate: Disable usermode helpers right before freezing tasks
firmware_class: Do not warn that system is not ready from async loads
firmware_class: Split _request_firmware() into three functions, v2
firmware_class: Rework usermodehelper check
PM / Runtime: don't forget to wake up waitqueue on failure
Merge common_audit_data cleanup patches from Eric Paris.
This is really too late, but it's a long-overdue cleanup of the costly
wrapper functions for the security layer.
The "struct common_audit_data" is used all over in critical paths,
allocated and initialized on the stack. And used to be much too large,
causing not only unnecessarily big stack frames but the clearing of the
(mostly useless) data was also very visible in profiles.
As a particular example, in one microbenchmark for just doing "stat()"
over files a lot, selinux_inode_permission() used 7% of the CPU time.
That's despite the fact that it doesn't actually *do* anything: it is
just a helper wrapper function in the selinux security layer.
This patch-series shrinks "struct common_audit_data" sufficiently that
code generation for these kinds of wrapper functions is improved
noticeably, and we spend much less time just initializing data that we
will never use.
The functions still get called all the time, and it still shows up at
3.5+% in my microbenchmark, but it's quite a bit lower down the list,
and much less noticeable.
* Emailed patches from Eric Paris <eparis@redhat.com>:
lsm_audit: don't specify the audit pre/post callbacks in 'struct common_audit_data'
SELinux: do not allocate stack space for AVC data unless needed
SELinux: remove avd from slow_avc_audit()
SELinux: remove avd from selinux_audit_data
LSM: shrink the common_audit_data data union
LSM: shrink sizeof LSM specific portion of common_audit_data
Pull regulator fixes from Mark Brown:
"A bunch of smallish fixes that came up during the merge window as
things got more testing - even more fixes from Axel, a fix for error
handling in more complex systems using -EPROBE_DEFER and a couple of
small fixes for the new dummy regulators."
* tag 'regulator-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: Remove non-existent parameter from fixed-helper.c kernel doc
regulator: Fix setting new voltage in s5m8767_set_voltage
regulator: fix sysfs name collision between dummy and fixed dummy regulator
regulator: Fix deadlock on removal of regulators with supplies
regulator: Fix comments in include/linux/regulator/machine.h
regulator: Only update [LDOx|DCx]_HIB_MODE bits in wm8350_[ldo|dcdc]_set_suspend_disable
regulator: Fix setting low power mode for wm831x aldo
regulator: Return microamps in wm8350_isink_get_current
regulator: wm8350: Fix the logic to choose best current limit setting
regulator: wm831x-isink: Fix the logic to choose best current limit setting
regulator: wm831x-dcdc: Fix the logic to choose best current limit setting
regulator: anatop: patching to device-tree property "reg".
regulator: Do proper shift to set correct bit for DC[2|5]_HIB_MODE setting
regulator: Fix restoring pmic.dcdcx_hib_mode settings in wm8350_dcdc_set_suspend_enable
regulator: Fix unbalanced lock/unlock in mc13892_regulator_probe error path
regulator: Fix set and get current limit for wm831x_buckv
regulator: tps6586x: Fix list minimal voltage setting for LDO0
Commit f04565ddf5 (dev: use name hash for dev_seq_ops) added a second
regression, as some devices are missing from /proc/net/dev if many
devices are defined.
When seq_file buffer is filled, the last ->next/show() method is
canceled (pos value is reverted to value prior ->next() call)
Problem is after above commit, we dont restart the lookup at right
position in ->start() method.
Fix this by removing the internal 'pos' pointer added in commit, since
we need to use the 'loff_t *pos' provided by seq_file layer.
This also reverts commit 5cac98dd0 (net: Fix corruption
in /proc/*/net/dev_mcast), since its not needed anymore.
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Mihai Maruseac <mmaruseac@ixiacom.com>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>