Provide rcu_virt_note_context_switch() for vitalization use to note
quiescent state during guest entry.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Many rcu callbacks functions just call kfree() on the base structure.
These functions are trivial, but their size adds up, and furthermore
when they are used in a kernel module, that module must invoke the
high-latency rcu_barrier() function at module-unload time.
The kfree_rcu() function introduced by this commit addresses this issue.
Rather than encoding a function address in the embedded rcu_head
structure, kfree_rcu() instead encodes the offset of the rcu_head
structure within the base structure. Because the functions are not
allowed in the low-order 4096 bytes of kernel virtual memory, offsets
up to 4095 bytes can be accommodated. If the offset is larger than
4095 bytes, a compile-time error will be generated in __kfree_rcu().
If this error is triggered, you can either fall back to use of call_rcu()
or rearrange the structure to position the rcu_head structure into the
first 4096 bytes.
Note that the allowable offset might decrease in the future, for example,
to allow something like kmem_cache_free_rcu().
The new kfree_rcu() function can replace code as follows:
call_rcu(&p->rcu, simple_kfree_callback);
where "simple_kfree_callback()" might be defined as follows:
void simple_kfree_callback(struct rcu_head *p)
{
struct foo *q = container_of(p, struct foo, rcu);
kfree(q);
}
with the following:
kfree_rcu(&p->rcu, rcu);
Note that the "rcu" is the name of a field in the structure being
freed. The reason for using this rather than passing in a pointer
to the base structure is that the above approach allows better type
checking.
This commit is based on earlier work by Lai Jiangshan and Manfred Spraul:
Lai's V1 patch: http://lkml.org/lkml/2008/9/18/1
Manfred's patch: http://lkml.org/lkml/2009/1/2/115
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Verify that rcu_head structures are aligned to a four-byte boundary.
This check is enabled by CONFIG_DEBUG_OBJECTS_RCU_HEAD.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
It is not possible to accurately correlate rcutorture output with that
of debugfs. This patch therefore adds a debugfs file that prints out
the rcutorture version number, permitting easy correlation.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
If RCU priority boosting is to be meaningful, callback invocation must
be boosted in addition to preempted RCU readers. Otherwise, in presence
of CPU real-time threads, the grace period ends, but the callbacks don't
get invoked. If the callbacks don't get invoked, the associated memory
doesn't get freed, so the system is still subject to OOM.
But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
moves the callback invocations to a kthread, which can be boosted easily.
Also add comments and properly synchronized all accesses to
rcu_cpu_kthread_task, as suggested by Lai Jiangshan.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Out of the entire GART/VM subsystem, the hw designers changed
the location of 3 regs.
v2: airlied: add parameter for userspace to work from.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: wm831x-ts - move BTN_TOUCH reporting to data transfer
Input: wm831x-ts - allow IRQ flags to be specified
Input: wm831x-ts - fix races with IRQ management
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
sysctl: net: call unregister_net_sysctl_table where needed
Revert: veth: remove unneeded ifname code from veth_newlink()
smsc95xx: fix reset check
tg3: Fix failure to enable WoL by default when possible
networking: inappropriate ioctl operation should return ENOTTY
amd8111e: trivial typo spelling: Negotitate -> Negotiate
ipv4: don't spam dmesg with "Using LC-trie" messages
af_unix: Only allow recv on connected seqpacket sockets.
mii: add support of pause frames in mii_get_an
net: ftmac100: fix scheduling while atomic during PHY link status change
usbnet: Transfer of maintainership
usbnet: add support for some Huawei modems with cdc-ether ports
bnx2: cancel timer on device removal
iwl4965: fix "Received BA when not expected"
iwlagn: fix "Received BA when not expected"
dsa/mv88e6131: fix unknown multicast/broadcast forwarding on mv88e6085
usbnet: Resubmit interrupt URB if device is open
iwl4965: fix "TX Power requested while scanning"
iwlegacy: led stay solid on when no traffic
b43: trivial: update module info about ucode16_mimo firmware
...
Move the SMBus device ID definitions of recent devices from pci_ids.h
to the i2c-i801.c driver file. They don't have to be shared, as they
are clearly identified and only used in this driver. In the future,
such IDs will go to i2c-i801 directly. This will make adding support
for new devices much faster and easier, as it will avoid cross-
subsystem patch sets and merge conflicts.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Seth Heasley <seth.heasley@intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/i915: restore only the mode of this driver on lastclose (v2)
drm/radeon/kms: add info query for tile pipes
drm/radeon/kms: add missing safe regs for 6xx/7xx
drm: select FRAMEBUFFER_CONSOLE_PRIMARY if we have FRAMEBUFFER_CONSOLE
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
nfs: don't lose MS_SYNCHRONOUS on remount of noac mount
NFS: Return meaningful status from decode_secinfo()
NFSv4: Ensure we request the ordinary fileid when doing readdirplus
NFSv4: Ensure that clientid and session establishment can time out
SUNRPC: Allow RPC calls to return ETIMEDOUT instead of EIO
NFSv4.1: Don't loop forever in nfs4_proc_create_session
NFSv4: Handle NFS4ERR_WRONGSEC outside of nfs4_handle_exception()
NFSv4.1: Don't update sequence number if rpc_task is not sent
NFSv4.1: Ensure state manager thread dies on last umount
SUNRPC: Fix the SUNRPC Kerberos V RPCSEC_GSS module dependencies
NFS: Use correct variable for page bounds checking
NFS: don't negotiate when user specifies sec flavor
NFS: Attempt mount with default sec flavor first
NFS: flav_array honors NFS_MAX_SECFLAVORS
NFS: Fix infinite loop in gss_create_upcall()
Don't mark_inode_dirty_sync() while holding lock
NFS: Get rid of pointless test in nfs_commit_done
NFS: Remove unused argument from nfs_find_best_sec()
NFS: Eliminate duplicate call to nfs_mark_request_dirty
NFS: Remove dead code from nfs_fs_mount()
Resubmit interrupt URB if device is open. Use a flag set in
usbnet_open() to determine this state. Also kill and free
interrupt URB in usbnet_disconnect().
[Rebased off git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git]
Signed-off-by: Paul Stewart <pstew@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The huge_memory.c THP page fault was allowed to run if vm_ops was null
(which would succeed for /dev/zero MAP_PRIVATE, as the f_op->mmap wouldn't
setup a special vma->vm_ops and it would fallback to regular anonymous
memory) but other THP logics weren't fully activated for vmas with vm_file
not NULL (/dev/zero has a not NULL vma->vm_file).
So this removes the vm_file checks so that /dev/zero also can safely use
THP (the other albeit safer approach to fix this bug would have been to
prevent the THP initial page fault to run if vm_file was set).
After removing the vm_file checks, this also makes huge_memory.c stricter
in khugepaged for the DEBUG_VM=y case. It doesn't replace the vm_file
check with a is_pfn_mapping check (but it keeps checking for VM_PFNMAP
under VM_BUG_ON) because for a is_cow_mapping() mapping VM_PFNMAP should
only be allowed to exist before the first page fault, and in turn when
vma->anon_vma is null (so preventing khugepaged registration). So I tend
to think the previous comment saying if vm_file was set, VM_PFNMAP might
have been set and we could still be registered in khugepaged (despite
anon_vma was not NULL to be registered in khugepaged) was too paranoid.
The is_linear_pfn_mapping check is also I think superfluous (as described
by comment) but under DEBUG_VM it is safe to stay.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=33682
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Caspar Zhang <bugs@casparzhang.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: <stable@kernel.org> [2.6.38.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allows maximum flexibility for configuring the direct GPIO based
interrupts.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Currently there is a race in the MMC core between a card-detect
rescan work and the clock-gating work, scheduled from a command
completion. Fix it by removing the dedicated clock-gating mutex
and using the MMC standard locking mechanism instead.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Cc: Simon Horman <horms@verge.net.au>
Cc: Magnus Damm <damm@opensource.se>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Cc: <stable@kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (42 commits)
[media] media: vb2: correct queue initialization order
[media] media: vb2: fix incorrect v4l2_buffer->flags handling
[media] s5p-fimc: Add support for the buffer timestamps and sequence
[media] s5p-fimc: Fix bytesperline and plane payload setup
[media] s5p-fimc: Do not allow changing format after REQBUFS
[media] s5p-fimc: Fix FIMC3 pixel limits on Exynos4
[media] tda18271: update tda18271c2_rf_cal as per NXP's rev.04 datasheet
[media] tda18271: update tda18271_rf_band as per NXP's rev.04 datasheet
[media] tda18271: fix bad calculation of main post divider byte
[media] tda18271: prog_cal and prog_tab variables should be s32, not u8
[media] tda18271: fix calculation bug in tda18271_rf_tracking_filters_init
[media] omap3isp: queue: Don't corrupt buf->npages when get_user_pages() fails
[media] v4l: Don't register media entities for subdev device nodes
[media] omap3isp: Don't increment node entity use count when poweron fails
[media] omap3isp: lane shifter support
[media] omap3isp: ccdc: support Y10/12, 8-bit bayer fmts
[media] media: add missing 8-bit bayer formats and Y12
[media] v4l: add V4L2_PIX_FMT_Y12 format
cx23885: Fix stv0367 Kconfig dependency
[media] omap3isp: Use isp xclk defines
...
Fix up trivial conflict (spelink errurs) in drivers/media/video/omap3isp/isp.c
When readdir() returns a directory entry for the root of a mounted
filesystem, Linux follows the old convention of returning the inode
number of the covered directory (despite newer versions of POSIX declaring
that this is a bug).
To ensure this continues to work, the NFSv4 readdir implementation requests
the 'mounted-on-fileid' from the server.
However, readdirplus also needs to instantiate an inode for this entry, and
for that, we also need to request the real fileid as per this patch.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
i915 calls the panic handler function on last close to reset the modes,
however this is a really bad idea for multi-gpu machines, esp shareable
gpus machines. So add a new entry point for the driver to just restore
its own fbcon mode.
v2: move code into fb helper, fix panic code to block mode change on
powered off GPUs.
[airlied: this hits drm core and I wrote it and it was reviewed on intel-gfx
so really I signed it off twice ;-).]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Now that the whole dcache_hash_bucket crap is gone, go all the way and
also remove the weird locking layering violations for locking the hash
buckets. Add hlist_bl_lock/unlock helpers to move the locking into the
list abstraction instead of requiring each caller to open code it.
After all allowing for the bit locks is the whole point of these helpers
over the plain hlist variant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we are waiting for the bit-lock to be released, and are looping
over the 'cpu_relax()' should not be doing anything else - otherwise we
miss the point of trying to do the whole 'cpu_relax()'.
Do the preemption enable/disable around the loop, rather than inside of
it.
Noticed when I was looking at the code generation for the dcache
__d_drop usage, and the code just looked very odd.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On occasion, it is useful for the NFS layer to distinguish between
soft timeouts and other EIO errors due to (say) encoding errors,
or authentication errors.
The following patch ensures that the default behaviour of the RPC
layer remains to return EIO on soft timeouts (until we have
audited all the callers).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If a server for some reason keeps sending NFS4ERR_DELAY errors, we can end
up looping forever inside nfs4_proc_create_session, and so the usual
mechanisms for detecting if the nfs_client is dead don't work.
Fix this by ensuring that we loop inside the nfs4_state_manager thread
instead.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>