Don't put an ACCESS op in OPEN compound if O_EXCL, because ACCESS
will return permission denied for all bits until close.
Fixes a regression due to commit 6168f62c (NFSv4: Add ACCESS operation to
OPEN compound)
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
"Server trunking" is a fancy named for a multi-homed NFS server.
Trunking might occur if a client sends NFS requests for a single
workload to multiple network interfaces on the same server. There
are some implications for NFSv4 state management that make it useful
for a client to know if a single NFSv4 server instance is
multi-homed. (Note this is only a consideration for NFSv4, not for
legacy versions of NFS, which are stateless).
If a client cares about server trunking, no NFSv4 operations can
proceed until that client determines who it is talking to. Thus
server IP trunking discovery must be done when the client first
encounters an unfamiliar server IP address.
The nfs_get_client() function walks the nfs_client_list and matches
on server IP address. The outcome of that walk tells us immediately
if we have an unfamiliar server IP address. It invokes
nfs_init_client() in this case. Thus, nfs4_init_client() is a good
spot to perform trunking discovery.
Discovery requires a client to establish a fresh client ID, so our
client will now send SETCLIENTID or EXCHANGE_ID as the first NFS
operation after a successful ping, rather than waiting for an
application to perform an operation that requires NFSv4 state.
The exact process for detecting trunking is different for NFSv4.0 and
NFSv4.1, so a minorversion-specific init_client callout method is
introduced.
CLID_INUSE recovery is important for the trunking discovery process.
CLID_INUSE is a sign the server recognizes the client's nfs_client_id4
id string, but the client is using the wrong principal this time for
the SETCLIENTID operation. The SETCLIENTID must be retried with a
series of different principals until one works, and then the rest of
trunking discovery can proceed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, the Linux client uses a unique nfs_client_id4.id string
when identifying itself to distinct NFS servers.
To support transparent state migration, the Linux client will have to
use the same nfs_client_id4 string for all servers it communicates
with (also known as the "uniform client string" approach). Otherwise
NFS servers can not recognize that open and lock state need to be
merged after a file system transition.
Unfortunately, there are some NFSv4.0 servers currently in the field
that do not tolerate the uniform client string approach.
Thus, by default, our NFSv4.0 mounts will continue to use the current
approach, and we introduce a mount option that switches them to use
the uniform model. Client administrators must identify which servers
can be mounted with this option. Eventually most NFSv4.0 servers will
be able to handle the uniform approach, and we can change the default.
The first mount of a server controls the behavior for all subsequent
mounts for the lifetime of that set of mounts of that server. After
the last mount of that server is gone, the client erases the data
structure that tracks the lease. A subsequent lease may then honor
a different "migration" setting.
This patch adds only the infrastructure for parsing the new mount
option. Support for uniform client strings is added in a subsequent
patch.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
An ULP is supposed to be able to replace a GSS rpc_auth object with
another GSS rpc_auth object using rpcauth_create(). However,
rpcauth_create() in 3.5 reliably fails with -EEXIST in this case.
This is because when gss_create() attempts to create the upcall pipes,
sometimes they are already there. For example if a pipe FS mount
event occurs, or a previous GSS flavor was in use for this rpc_clnt.
It turns out that's not the only problem here. While working on a
fix for the above problem, we noticed that replacing an rpc_clnt's
rpc_auth is not safe, since dereferencing the cl_auth field is not
protected in any way.
So we're deprecating the ability of rpcauth_create() to switch an
rpc_clnt's security flavor during normal operation. Instead, let's
add a fresh API that clones an rpc_clnt and gives the clone a new
flavor before it's used.
This makes immediate use of the new __rpc_clone_client() helper.
This can be used in a similar fashion to rpcauth_create() when a
client is hunting for the correct security flavor. Instead of
replacing an rpc_clnt's security flavor in a loop, the ULP replaces
the whole rpc_clnt.
To fix the -EEXIST problem, any ULP logic that relies on replacing
an rpc_clnt's rpc_auth with rpcauth_create() must be changed to use
this API instead.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The OPEN operation has no way to differentiate an open for read and an
open for execution - both look like read to the server. This allowed
users to read files that didn't have READ access but did have EXEC access,
which is obviously wrong.
This patch adds an ACCESS call to the OPEN compound to handle the
difference between OPENs for reading and execution. Since we're going
through the trouble of calling ACCESS, we check all possible access bits
and cache the results hopefully avoiding an ACCESS call in the future.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that we do return errors from nfs4_proc_layoutget() and that we
don't mark the layout as having failed if the error was due to a
signal or resource problem on the client side.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
It is only set after everyone has dereferenced the transport,
and serves no useful purpose: setting it is racy, so all the
socket code, etc still needs to be able to cope with the cases
where they miss reading it.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the server reboots before it can commit the unstable writes to disk,
then nfs_commit_release_pages() will detect this when it compares the
verifier returned by COMMIT to the one returned by WRITE. When this
happens, the client needs to resend those writes in order to guarantee
that they make it to stable storage.
This patch adds a signalling mechanism to notify fsync() that it
needs to retry all writes before it can exit.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We want to be able to pass on the information that the page was not
dirtied under a lock. Instead of adding a flag parameter, do this
by passing a pointer to a 'struct nfs_lock_owner' that may be NULL.
Also reuse this structure in struct nfs_lock_context to carry the
fl_owner_t and pid_t.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
IBM reported a soft lockup after applying the fix for the rename_lock
deadlock. Commit c83ce989cb ("VFS: Fix the nfs sillyrename regression
in kernel 2.6.38") was found to be the culprit.
The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the
dentry was killed. This flag can be set on non-killed dentries too,
which results in infinite retries when trying to traverse the dentry
tree.
This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is
only set in d_kill() and makes try_to_ascend() test only this flag.
IBM reported successful test results with this patch.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc 4.6+ has support for a externally_visible attribute that prevents the
optimizer from optimizing unused symbols away. Add a __visible macro to
use it with that compiler version or later.
This is used (at least) by the "Link Time Optimization" patchset.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This warning:
In file included from linux/include/linux/tcp.h:227:0,
from linux/include/linux/ipv6.h:221,
from linux/include/net/ipv6.h:16,
from linux/include/linux/sunrpc/clnt.h:26,
from linux/net/sunrpc/stats.c:22:
linux/include/net/sock.h: In function `sk_rmem_schedule':
linux/nfs-2.6/include/net/sock.h:1339:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
is seen with gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2) using the
-Wextra option.
Commit c76562b670 ("netvm: prevent a stream-specific deadlock")
accidentally replaced the "size" parameter of sk_rmem_schedule() with an
unsigned int. This changes the semantics of the comparison in the
return statement.
In sk_wmem_schedule we have syntactically the same comparison, but
"size" is a signed integer. In addition, __sk_mem_schedule() takes a
signed integer for its "size" parameter, so there is an implicit type
conversion in sk_rmem_schedule() anyway.
Revert the "size" parameter back to a signed integer so that the
semantics of the expressions in both sk_[rw]mem_schedule() are exactly
the same.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I found following definition in include/linux/memory.h, in my IA64
platform, SECTION_SIZE_BITS is equal to 32, and MIN_MEMORY_BLOCK_SIZE
will be 0.
#define MIN_MEMORY_BLOCK_SIZE (1 << SECTION_SIZE_BITS)
Because MIN_MEMORY_BLOCK_SIZE is int type and length of 32bits,
so MIN_MEMORY_BLOCK_SIZE(1 << 32) will will equal to 0.
Actually when SECTION_SIZE_BITS >= 31, MIN_MEMORY_BLOCK_SIZE will be wrong.
This will cause wrong system memory infomation in sysfs.
I think it should be:
#define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS)
And "echo offline > memory0/state" will cause following call trace:
kernel BUG at mm/memory_hotplug.c:885!
sh[6455]: bugcheck! 0 [1]
Pid: 6455, CPU 0, comm: sh
psr : 0000101008526030 ifs : 8000000000000fa4 ip : [<a0000001008c40f0>] Not tainted (3.6.0-rc1)
ip is at offline_pages+0x210/0xee0
Call Trace:
show_stack+0x80/0xa0
show_regs+0x640/0x920
die+0x190/0x2c0
die_if_kernel+0x50/0x80
ia64_bad_break+0x3d0/0x6e0
ia64_native_leave_kernel+0x0/0x270
offline_pages+0x210/0xee0
alloc_pages_current+0x180/0x2a0
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull mfd fixes from Samuel Ortiz:
"This is the remaining MFD fixes for 3.6, with 5 pending fixes:
- A tps65217 build error fix.
- A lcp_ich regression fix caused by the MFD driver failing to
initialize the watchdog sub device due to ACPI conflicts.
- 2 MAX77693 interrupt handling bug fixes.
- An MFD core fix, adding an IRQ domain argument to the MFD device
addition API in order to prevent silent and potentially harmful
remapping behaviour changes for drivers supporting non-DT
platforms."
* tag 'mfd-for-linus-3.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
mfd: MAX77693: Fix NULL pointer error when initializing irqs
mfd: MAX77693: Fix interrupt handling bug
mfd: core: Push irqdomain mapping out into devices
mfd: lpc_ich: Fix a 3.5 kernel regression for iTCO_wdt driver
mfd: Move tps65217 regulator plat data handling to regulator
Pull scsi target fixes from Nicholas Bellinger:
"Here is the current set of target-pending fixes headed for v3.6-final
The main parts of this series include bug-fixes from Paolo Bonzini to
address an use-after-free bug in pSCSI sense exception handling, along
with addressing some long-standing bugs wrt the handling of zero-
length SCSI CDB payloads also specific to pSCSI pass-through device
backends."
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: go through normal processing for zero-length REQUEST_SENSE
target: support zero allocation length in REQUEST SENSE
target: support zero-size allocation lengths in transport_kmap_data_sg
target: fail REPORT LUNS with less than 16 bytes of payload
target: report too-small parameter lists everywhere
target: go through normal processing for zero-length PSCSI commands
target: fix use-after-free with PSCSI sense data
target: simplify code around transport_get_sense_data
target: move transport_get_sense_data
target: Check idr_get_new return value in iscsi_login_zero_tsih_s1
target: Fix ->data_length re-assignment bug with SCSI overflow
Pull more sound fixes from Takashi Iwai:
"Yet more (a bunch of) small fixes that slipped from the previous pull
request. Most of commits are pending ASoC fixes, all of which are
fairly trivial commits."
* tag 'sound-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: wm8904: correct the index
ALSA: hda - Yet another position_fix quirk for ASUS machines
ASoC: tegra: fix maxburst settings in dmaengine code
ASoC: samsung dma - Don't indicate support for pause/resume.
ASoC: mc13783: Remove mono support
ASoC: arizona: Fix typo in 44.1kHz rates
ASoC: spear: correct the check for NULL dma_buffer pointer
sound: tegra_alc5632: remove HP detect GPIO inversion
ASoC: atmel-ssc: include linux/io.h for raw io
ASoC: dapm: Don't force card bias level to be updated
ASoC: dapm: Make sure we update the bias level for CODECs with no op
ASoC: am3517evm: fix error return code
ASoC: ux500_msp_i2s: better use devm functions and fix error return code
ASoC: imx-sgtl5000: fix error return code
This reverts commit 970e178985.
Nikolay Ulyanitsky reported thatthe 3.6-rc5 kernel has a 15-20%
performance drop on PostgreSQL 9.2 on his machine (running "pgbench").
Borislav Petkov was able to reproduce this, and bisected it to this
commit 970e178985 ("sched: Improve scalability via 'CPU buddies' ...")
apparently because the new single-idle-buddy model simply doesn't find
idle CPU's to reschedule on aggressively enough.
Mike Galbraith suspects that it is likely due to the user-mode spinlocks
in PostgreSQL not reacting well to preemption, but we don't really know
the details - I'll just revert the commit for now.
There are hopefully other approaches to improve scheduler scalability
without it causing these kinds of downsides.
Reported-by: Nikolay Ulyanitsky <lystor@gmail.com>
Bisected-by: Borislav Petkov <bp@alien8.de>
Acked-by: Mike Galbraith <efault@gmx.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the MFD core supports remapping MFD cell interrupts using an
irqdomain but only if the MFD is being instantiated using device tree
and only if the device tree bindings use the pattern of registering IPs
in the device tree with compatible properties. This will be actively
harmful for drivers which support non-DT platforms and use this pattern
for their DT bindings as it will mean that the core will silently change
remapping behaviour and it is also limiting for drivers which don't do
DT with this particular pattern. There is also a potential fragility if
there are interrupts not associated with MFD cells and all the cells are
omitted from the device tree for some reason.
Instead change the code to take an IRQ domain as an optional argument,
allowing drivers to take the decision about the parent domain for their
interrupts. The one current user of this feature is ab8500-core, it has
the domain lookup pushed out into the driver.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
ASoC: Updates for 3.6
A bigger set of updates than I'm entirely comfortable with - things
backed up a bit due to travel. As ever the majority of these are small,
focused updates for specific drivers though there are a couple of core
changes. There's been good exposure in -next.
The AT91 patch fixes a build break.
Pull i2c embedded fixes from Wolfram Sang:
"The last bunch of (typical) i2c-embedded driver fixes for 3.6.
Also update the MAINTAINERS file to point to my tree since people keep
asking where to find their patches."
* 'i2c-embedded/for-current' of git://git.pengutronix.de/git/wsa/linux:
i2c: algo: pca: Fix mode selection for PCA9665
MAINTAINERS: fix tree for current i2c-embedded development
i2c: mxs: correctly setup speed for non devicetree
i2c: pnx: Fix read transactions of >= 2 bytes
i2c: pnx: Fix bit definitions
Pull drm fixes from Dave Airlie:
"I realise this a bit bigger than I would want at this point.
Exynos is a large chunk, I got them to half what they wanted already,
and hey its ARM based, so not going to hurt many people.
Radeon has only two fixes, but the PLL fixes were a bit bigger, but
required for a lot of scenarios, the fence fix is really urgent.
vmwgfx: I've pulled in a dumb ioctl support patch that I was going to
shove in later and cc stable, but we need it asap, its mainly to stop
mesa growing a really ugly dependency in userspace to run stuff on
vmware, and if I don't stick it in the kernel now, everyone will have
to ship ugly userspace libs to workaround it.
nouveau: single urgent fix found in F18 testing, causes X to not start
properly when f18 plymouth is used
i915: smattering of fixes and debug quieting
gma500: single regression fix
So as I said a bit large, but its fairly well scattered and its all
stuff I'll be shipping in F18's 3.6 kernel."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (26 commits)
drm/nouveau: fix booting with plymouth + dumb support
drm/radeon: make 64bit fences more robust v3
drm/radeon: rework pll selection (v3)
drm: Drop the NV12M and YUV420M formats
drm/exynos: remove DRM_FORMAT_NV12M from plane module
drm/exynos: fix double call of drm_prime_(init/destroy)_file_private
drm/exynos: add dummy support for dmabuf-mmap
drm/exynos: Add missing braces around sizeof in exynos_mixer.c
drm/exynos: Add missing braces around sizeof in exynos_hdmi.c
drm/exynos: Make g2d_pm_ops static
drm/exynos: Add dependency for G2D in Kconfig
drm/exynos: fixed page align bug.
drm/exynos: Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(.. [1]
drm/exynos: Use devm_* functions in exynos_drm_g2d.c file
drm/exynos: Use devm_kzalloc in exynos_drm_hdmi.c file
drm/exynos: Use devm_kzalloc in exynos_drm_vidi.c file
drm/exynos: Remove redundant check in exynos_drm_fimd.c file
drm/exynos: Remove redundant check in exynos_hdmi.c file
vmwgfx: add dumb ioctl support
gma500: Fix regression on Oaktrail devices
...
Pull perf fixes from Ingo Molnar:
"This tree includes various fixes"
Ingo really needs to improve on the whole "explain git pull" part.
"Various fixes" indeed.
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/hwpb: Invoke __perf_event_disable() if interrupts are already disabled
perf/x86: Enable Intel Cedarview Atom suppport
perf_event: Switch to internal refcount, fix race with close()
oprofile, s390: Fix uninitialized memory access when writing to oprofilefs
perf/x86: Fix microcode revision check for SNB-PEBS
Pull networking fixes from David Miller:
1) Use after free and new device IDs in bluetooth from Andre Guedes,
Yevgeniy Melnichuk, Gustavo Padovan, and Henrik Rydberg.
2) Fix crashes with short packet lengths and VLAN in pktgen, from
Nishank Trivedi.
3) mISDN calls flush_work_sync() with locks held, fix from Karsten
Keil.
4) Packet scheduler gred parameters are reported to userspace
improperly scaled, and WRED idling is not performed correctly. All
from David Ward.
5) Fix TCP socket refcount problem in ipv6, from Julian Anastasov.
6) ibmveth device has RX queue alignment requirements which are not
being explicitly met resulting in sporadic failures, fix from
Santiago Leon.
7) Netfilter needs to take care when interpreting sockets attached to
socket buffers, they could be time-wait minisockets. Fix from Eric
Dumazet.
8) sock_edemux() has the same issue as netfilter did in #7 above, fix
from Eric Dumazet.
9) Avoid infinite loops in CBQ scheduler with some configurations, from
Eric Dumazet.
10) Deal with "Reflection scan: an Off-Path Attack on TCP", from Jozsef
Kadlecsik.
11) SCTP overcharges socket for TX packets, fix from Thomas Graf.
12) CODEL packet scheduler should not reset it's state every time it
builds a new flow, fix from Eric Dumazet.
13) Fix memory leak in nl80211, from Wei Yongjun.
14) NETROM doesn't check skb_copy_datagram_iovec() return values, from
Alan Cox.
15) l2tp ethernet was using sizeof(ETH_HLEN) instead of plain ETH_HLEN,
oops. From Eric Dumazet.
16) Fix selection of ath9k chips on which PA linearization and AM2PM
predistoration are used, from Felix Fietkau.
17) Flow steering settings in mlx4 driver need to be validated properly,
from Hadar Hen Zion.
18) bnx2x doesn't show the correct link duplex setting, from Yaniv
Rosner.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits)
pktgen: fix crash with vlan and packet size less than 46
bnx2x: Add missing afex code
bnx2x: fix registers dumped
bnx2x: correct advertisement of pause capabilities
bnx2x: display the correct duplex value
bnx2x: prevent timeouts when using PFC
bnx2x: fix stats copying logic
bnx2x: Avoid sending multiple statistics queries
net: qmi_wwan: call subdriver with control intf only
net_sched: gred: actually perform idling in WRED mode
net_sched: gred: fix qave reporting via netlink
net_sched: gred: eliminate redundant DP prio comparisons
net_sched: gred: correct comment about qavg calculation in RIO mode
mISDN: Fix wrong usage of flush_work_sync while holding locks
netfilter: log: Fix log-level processing
net-sched: sch_cbq: avoid infinite loop
net: qmi_wwan: fix Gobi device probing for un2430
net: fix net/core/sock.c build error
ixp4xx_hss: fix build failure due to missing linux/module.h inclusion
caif: move the dereference below the NULL test
...
Pull driver core fix from Greg Kroah-Hartman:
"Here is one fix for 3.6-rc6 for the kobject.h file.
It fixes a reported oops if CONFIG_HOTPLUG is disabled. It's been in
the linux-next tree for a while now.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'driver-core-3.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
kobject: fix oops with "input0: bad kobj_uevent_env content in show_uevent()"
It is a bad idea to hold a spinlock and call flush_work_sync.
Move the workqueue cleanup outside the spinlock and use cancel_work_sync,
on closing the channel this seems to be the more correct function.
Remove the never used and constant return value of mISDN_freebchannel.
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Cc: <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>