Pull vfs pile (part one) from Al Viro:
"Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
locking violations, etc.
The most visible changes here are death of FS_REVAL_DOT (replaced with
"has ->d_weak_revalidate()") and a new helper getting from struct file
to inode. Some bits of preparation to xattr method interface changes.
Misc patches by various people sent this cycle *and* ocfs2 fixes from
several cycles ago that should've been upstream right then.
PS: the next vfs pile will be xattr stuff."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
saner proc_get_inode() calling conventions
proc: avoid extra pde_put() in proc_fill_super()
fs: change return values from -EACCES to -EPERM
fs/exec.c: make bprm_mm_init() static
ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
ocfs2: fix possible use-after-free with AIO
ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
target: writev() on single-element vector is pointless
export kernel_write(), convert open-coded instances
fs: encode_fh: return FILEID_INVALID if invalid fid_type
kill f_vfsmnt
vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
nfsd: handle vfs_getattr errors in acl protocol
switch vfs_getattr() to struct path
default SET_PERSONALITY() in linux/elf.h
ceph: prepopulate inodes only when request is aborted
d_hash_and_lookup(): export, switch open-coded instances
9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
9p: split dropping the acls from v9fs_set_create_acl()
...
Pull signal handling cleanups from Al Viro:
"This is the first pile; another one will come a bit later and will
contain SYSCALL_DEFINE-related patches.
- a bunch of signal-related syscalls (both native and compat)
unified.
- a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE
(fixing several potential problems with missing argument
validation, while we are at it)
- a lot of now-pointless wrappers killed
- a couple of architectures (cris and hexagon) forgot to save
altstack settings into sigframe, even though they used the
(uninitialized) values in sigreturn; fixed.
- microblaze fixes for delivery of multiple signals arriving at once
- saner set of helpers for signal delivery introduced, several
architectures switched to using those."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits)
x86: convert to ksignal
sparc: convert to ksignal
arm: switch to struct ksignal * passing
alpha: pass k_sigaction and siginfo_t using ksignal pointer
burying unused conditionals
make do_sigaltstack() static
arm64: switch to generic old sigaction() (compat-only)
arm64: switch to generic compat rt_sigaction()
arm64: switch compat to generic old sigsuspend
arm64: switch to generic compat rt_sigqueueinfo()
arm64: switch to generic compat rt_sigpending()
arm64: switch to generic compat rt_sigprocmask()
arm64: switch to generic sigaltstack
sparc: switch to generic old sigsuspend
sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE
sparc: kill sign-extending wrappers for native syscalls
kill sparc32_open()
sparc: switch to use of generic old sigaction
sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE
mips: switch to generic sys_fork() and sys_clone()
...
Pull networking update from David Miller:
1) Checkpoint/restarted TCP sockets now can properly propagate the TCP
timestamp offset. From Andrey Vagin.
2) VMWARE VM VSOCK layer, from Andy King.
3) Much improved support for virtual functions and SR-IOV in bnx2x,
from Ariel ELior.
4) All protocols on ipv4 and ipv6 are now network namespace aware, and
all the compatability checks for initial-namespace-only protocols is
removed. Thanks to Tom Parkin for helping deal with the last major
holdout, L2TP.
5) IPV6 support in netpoll and network namespace support in pktgen,
from Cong Wang.
6) Multiple Registration Protocol (MRP) and Multiple VLAN Registration
Protocol (MVRP) support, from David Ward.
7) Compute packet lengths more accurately in the packet scheduler, from
Eric Dumazet.
8) Use per-task page fragment allocator in skb_append_datato_frags(),
also from Eric Dumazet.
9) Add support for connection tracking labels in netfilter, from
Florian Westphal.
10) Fix default multicast group joining on ipv6, and add anti-spoofing
checks to 6to4 and 6rd. From Hannes Frederic Sowa.
11) Make ipv4/ipv6 fragmentation memory limits more reasonable in modern
times, rearrange inet frag datastructures for better cacheline
locality, and move more operations outside of locking. From Jesper
Dangaard Brouer.
12) Instead of strict master <--> slave relationships, allow arbitrary
scenerios with "upper device lists". From Jiri Pirko.
13) Improve rate limiting accuracy in TBF and act_police, also from Jiri
Pirko.
14) Add a BPF filter netfilter match target, from Willem de Bruijn.
15) Orphan and delete a bunch of pre-historic networking drivers from
Paul Gortmaker.
16) Add TSO support for GRE tunnels, from Pravin B SHelar. Although
this still needs some minor bug fixing before it's %100 correct in
all cases.
17) Handle unresolved IPSEC states like ARP, with a resolution packet
queue. From Steffen Klassert.
18) Remove TCP Appropriate Byte Count support (ABC), from Stephen
Hemminger. This was long overdue.
19) Support SO_REUSEPORT, from Tom Herbert.
20) Allow locking a socket BPF filter, so that it cannot change after a
process drops capabilities.
21) Add VLAN filtering to bridge, from Vlad Yasevich.
22) Bring ipv6 on-par with ipv4 and do not cache neighbour entries in
the ipv6 routes, from YOSHIFUJI Hideaki.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1538 commits)
ipv6: fix race condition regarding dst->expires and dst->from.
net: fix a wrong assignment in skb_split()
ip_gre: remove an extra dst_release()
ppp: set qdisc_tx_busylock to avoid LOCKDEP splat
atl1c: restore buffer state
net: fix a build failure when !CONFIG_PROC_FS
net: ipv4: fix waring -Wunused-variable
net: proc: fix build failed when procfs is not configured
Revert "xen: netback: remove redundant xenvif_put"
net: move procfs code to net/core/net-procfs.c
qmi_wwan, cdc-ether: add ADU960S
bonding: set sysfs device_type to 'bond'
bonding: fix bond_release_all inconsistencies
b44: use netdev_alloc_skb_ip_align()
xen: netback: remove redundant xenvif_put
net: fec: Do a sanity check on the gpio number
ip_gre: propogate target device GSO capability to the tunnel device
ip_gre: allow CSUM capable devices to handle packets
bonding: Fix initialize after use for 3ad machine state spinlock
bonding: Fix race condition between bond_enslave() and bond_3ad_update_lacp_rate()
...
Pull sparc updates from David Miller:
"Mostly more sparc64 THP bug fixes, and a refactoring of SMP bootup on
sparc32 from Sam Ravnborg."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc32: refactor smp boot
sparc64: Fix huge PMD to PTE translation for sun4u in TLB miss handler.
sparc64: Fix tsb_grow() in atomic context.
sparc64: Handle hugepage TSB being NULL.
sparc64: Fix gfp_flags setting in tsb_grow().
When we set the sun4u version of the PTE execute bit, it's:
or REG, _PAGE_EXEC_4U, REG
_PAGE_EXEC_4U is 0x1000, unfortunately the immedate field of the
'or' instruction is a signed 13-bit value. So the above actually
assembles into:
or REG, -4096, REG
completely corrupting the final PTE value.
Set it with a:
sethi %hi(_PAGE_EXEC_4U), TMP
or REG, TMP, REG
sequence instead.
This fixes "git gc" crashes on sun4u machines.
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ACPI and power management updates from Rafael Wysocki:
- Rework of the ACPI namespace scanning code from Rafael J. Wysocki
with contributions from Bjorn Helgaas, Jiang Liu, Mika Westerberg,
Toshi Kani, and Yinghai Lu.
- ACPI power resources handling and ACPI device PM update from Rafael
J Wysocki.
- ACPICA update to version 20130117 from Bob Moore and Lv Zheng with
contributions from Aaron Lu, Chao Guan, Jesper Juhl, and Tim Gardner.
- Support for Intel Lynxpoint LPSS from Mika Westerberg.
- cpuidle update from Len Brown including Intel Haswell support, C1
state for intel_idle, removal of global pm_idle.
- cpuidle fixes and cleanups from Daniel Lezcano.
- cpufreq fixes and cleanups from Viresh Kumar and Fabio Baltieri with
contributions from Stratos Karafotis and Rickard Andersson.
- Intel P-states driver for Sandy Bridge processors from Dirk
Brandewie.
- cpufreq driver for Marvell Kirkwood SoCs from Andrew Lunn.
- cpufreq fixes related to ordering issues between acpi-cpufreq and
powernow-k8 from Borislav Petkov and Matthew Garrett.
- cpufreq support for Calxeda Highbank processors from Mark Langsdorf
and Rob Herring.
- cpufreq driver for the Freescale i.MX6Q SoC and cpufreq-cpu0 update
from Shawn Guo.
- cpufreq Exynos fixes and cleanups from Jonghwan Choi, Sachin Kamat,
and Inderpal Singh.
- Support for "lightweight suspend" from Zhang Rui.
- Removal of the deprecated power trace API from Paul Gortmaker.
- Assorted updates from Andreas Fleig, Colin Ian King, Davidlohr Bueso,
Joseph Salisbury, Kees Cook, Li Fei, Nishanth Menon, ShuoX Liu,
Srinivas Pandruvada, Tejun Heo, Thomas Renninger, and Yasuaki
Ishimatsu.
* tag 'pm+acpi-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (267 commits)
PM idle: remove global declaration of pm_idle
unicore32 idle: delete stray pm_idle comment
openrisc idle: delete pm_idle
mn10300 idle: delete pm_idle
microblaze idle: delete pm_idle
m32r idle: delete pm_idle, and other dead idle code
ia64 idle: delete pm_idle
cris idle: delete idle and pm_idle
ARM64 idle: delete pm_idle
ARM idle: delete pm_idle
blackfin idle: delete pm_idle
sparc idle: rename pm_idle to sparc_idle
sh idle: rename global pm_idle to static sh_idle
x86 idle: rename global pm_idle to static x86_idle
APM idle: register apm_cpu_idle via cpuidle
cpufreq / intel_pstate: Add kernel command line option disable intel_pstate.
cpufreq / intel_pstate: Change to disallow module build
tools/power turbostat: display SMI count by default
intel_idle: export both C1 and C1E
ACPI / hotplug: Fix concurrency issues and memory leaks
...
If our first THP installation for an MM is via the set_pmd_at() done
during khugepaged's collapsing we'll end up in tsb_grow() trying to do
a GFP_KERNEL allocation with several locks held.
Simply using GFP_ATOMIC in this situation is not the best option
because we really can't have this fail, so we'd really like to keep
this an order 0 GFP_KERNEL allocation if possible.
Also, doing the TSB allocation from khugepaged is a really bad idea
because we'll allocate it potentially from the wrong NUMA node in that
context.
So what we do is defer the hugepage TSB allocation until the first TLB
miss we take on a hugepage. This is slightly tricky because we have
to handle two unusual cases:
1) Taking the first hugepage TLB miss in the window trap handler.
We'll call the winfix_trampoline when that is detected.
2) An initial TSB allocation via TLB miss races with a hugetlb
fault on another cpu running the same MM. We handle this by
unconditionally loading the TSB we see into the current cpu
even if it's non-NULL at hugetlb_setup time.
Reported-by: Meelis Roos <mroos@ut.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
(pm_idle)() is being removed from linux/pm.h
because Linux does not have such a cross-architecture concept.
sparc uses an idle function pointer in its architecture
specific code. So we re-name sparc use of pm_idle to sparc_idle.
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
__ARCH_WANT_SYS_RT_SIGACTION,
__ARCH_WANT_SYS_RT_SIGSUSPEND,
__ARCH_WANT_COMPAT_SYS_RT_SIGSUSPEND,
__ARCH_WANT_COMPAT_SYS_SCHED_RR_GET_INTERVAL - not used anymore
CONFIG_GENERIC_{SIGALTSTACK,COMPAT_RT_SIG{ACTION,QUEUEINFO,PENDING,PROCMASK}} -
can be assumed always set.
Mostly mirrors the s390 logic, as unlike x86 we don't need the
SetPageReferenced() bits.
On sparc64 we also lack a user/privileged bit in the huge PMDs.
In order to make this work for THP and non-THP builds, some header
file adjustments were necessary. Namely, provide the PMD_HUGE_* bit
defines and the pmd_large() inline unconditionally rather than
protected by TRANSPARENT_HUGEPAGE.
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
note that due to historical accident we do *not* directly take
generic versions - need to check and invert the sign of signal
number first.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Only alpha and sparc are unusual - they have ka_restorer in it.
And nobody needs that exposed to userland.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Definitions and macros for implementing soreusport.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While a privileged program can open a raw socket, attach some
restrictive filter and drop its privileges (or send the socket to an
unprivileged program through some Unix socket), the filter can still
be removed or modified by the unprivileged program. This commit adds a
socket option to lock the filter (SO_LOCK_FILTER) preventing any
modification of a socket filter program.
This is similar to OpenBSD BIOCLOCK ioctl on bpf sockets, except even
root is not allowed change/drop the filter.
The state of the lock can be read with getsockopt(). No error is
triggered if the state is not changed. -EPERM is returned when a user
tries to remove the lock or to change/remove the filter while the lock
is active. The check is done directly in sk_attach_filter() and
sk_detach_filter() and does not affect only setsockopt() syscall.
Signed-off-by: Vincent Bernat <bernat@luffy.cx>
Signed-off-by: David S. Miller <davem@davemloft.net>
__devinit, __devexit annotations are nops - so drop them.
Likewise for __devexit_p.
Adjusted alignment of arguments when needed.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull signal handling cleanups from Al Viro:
"sigaltstack infrastructure + conversion for x86, alpha and um,
COMPAT_SYSCALL_DEFINE infrastructure.
Note that there are several conflicts between "unify
SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline;
resolution is trivial - just remove definitions of SS_ONSTACK and
SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and
include/uapi/linux/signal.h contains the unified variant."
Fixed up conflicts as per Al.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
alpha: switch to generic sigaltstack
new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
generic compat_sys_sigaltstack()
introduce generic sys_sigaltstack(), switch x86 and um to it
new helper: compat_user_stack_pointer()
new helper: restore_altstack()
unify SS_ONSTACK/SS_DISABLE definitions
new helper: current_user_stack_pointer()
missing user_stack_pointer() instances
Bury the conditionals from kernel_thread/kernel_execve series
COMPAT_SYSCALL_DEFINE: infrastructure
Pull IOMMU updates from Joerg Roedel:
"A few new features this merge-window. The most important one is
probably, that dma-debug now warns if a dma-handle is not checked with
dma_mapping_error by the device driver. This requires minor changes
to some architectures which make use of dma-debug. Most of these
changes have the respective Acks by the Arch-Maintainers.
Besides that there are updates to the AMD IOMMU driver for refactor
the IOMMU-Groups support and to make sure it does not trigger a
hardware erratum.
The OMAP changes (for which I pulled in a branch from Tony Lindgren's
tree) have a conflict in linux-next with the arm-soc tree. The
conflict is in the file arch/arm/mach-omap2/clock44xx_data.c which is
deleted in the arm-soc tree. It is safe to delete the file too so
solve the conflict. Similar changes are done in the arm-soc tree in
the common clock framework migration. A missing hunk from the patch
in the IOMMU tree will be submitted as a seperate patch when the
merge-window is closed."
* tag 'iommu-updates-v3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (29 commits)
ARM: dma-mapping: support debug_dma_mapping_error
ARM: OMAP4: hwmod data: ipu and dsp to use parent clocks instead of leaf clocks
iommu/omap: Adapt to runtime pm
iommu/omap: Migrate to hwmod framework
iommu/omap: Keep mmu enabled when requested
iommu/omap: Remove redundant clock handling on ISR
iommu/amd: Remove obsolete comment
iommu/amd: Don't use 512GB pages
iommu/tegra: smmu: Move bus_set_iommu after probe for multi arch
iommu/tegra: gart: Move bus_set_iommu after probe for multi arch
iommu/tegra: smmu: Remove unnecessary PTC/TLB flush all
tile: dma_debug: add debug_dma_mapping_error support
sh: dma_debug: add debug_dma_mapping_error support
powerpc: dma_debug: add debug_dma_mapping_error support
mips: dma_debug: add debug_dma_mapping_error support
microblaze: dma-mapping: support debug_dma_mapping_error
ia64: dma_debug: add debug_dma_mapping_error support
c6x: dma_debug: add debug_dma_mapping_error support
ARM64: dma_debug: add debug_dma_mapping_error support
intel-iommu: Prevent devices with RMRRs from being placed into SI Domain
...
All architectures have
CONFIG_GENERIC_KERNEL_THREAD
CONFIG_GENERIC_KERNEL_EXECVE
__ARCH_WANT_SYS_EXECVE
None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers
of kernel_execve() (which is a trivial wrapper for do_execve() now) left.
Kill the conditionals and make both callers use do_execve().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We can elide flush_tlb_*() calls when _PAGE_VALID is clear
as that is the test used to determine whether or not to
queue up a TLB flush in set_pte_at().
Signed-off-by: David S. Miller <davem@davemloft.net>