Since ocfs2_cow_file_pos will invoke ocfs2_refcount_icow with a NULL as
the struct file pointer, it finally result in a null pointer dereference
in ocfs2_duplicate_clusters_by_page.
This patch replace file pointer with inode pointer in
cow_duplicate_clusters to fix this issue.
[jeff.liu@oracle.com: rebased patch against linux-next tree]
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Acked-by: Tao Ma <tm@tao.ma>
Tested-by: David Weber <wb@munzinger.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert commit 40bd62eb7f ("fs/ocfs2/journal.h: add bits_wanted while
calculating credits in ocfs2_calc_extend_credits").
Unfortunately this change broke fallocate even if there is insufficient
disk space for the preallocation, which is a serious problem.
# df -h
/dev/sda8 22G 1.2G 21G 6% /ocfs2
# fallocate -o 0 -l 200M /ocfs2/testfile
fallocate: /ocfs2/test: fallocate failed: No space left on device
and a kernel warning:
CPU: 3 PID: 3656 Comm: fallocate Tainted: G W O 3.11.0-rc3 #2
Call Trace:
dump_stack+0x77/0x9e
warn_slowpath_common+0xc4/0x110
warn_slowpath_null+0x2a/0x40
start_this_handle+0x6c/0x640 [jbd2]
jbd2__journal_start+0x138/0x300 [jbd2]
jbd2_journal_start+0x23/0x30 [jbd2]
ocfs2_start_trans+0x166/0x300 [ocfs2]
__ocfs2_extend_allocation+0x38f/0xdb0 [ocfs2]
ocfs2_allocate_unwritten_extents+0x3c9/0x520
__ocfs2_change_file_space+0x5e0/0xa60 [ocfs2]
ocfs2_fallocate+0xb1/0xe0 [ocfs2]
do_fallocate+0x1cb/0x220
SyS_fallocate+0x6f/0xb0
system_call_fastpath+0x16/0x1b
JBD2: fallocate wants too many credits (51216 > 4381)
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's always a bad idea to poll on HW bits without a timeout.
The i.MX28 RTC can be easily brought into a state in which the RTC is
not running (until after a power-on-reset) and thus the status bits
which are polled in the driver won't ever change.
This patch prevents the kernel from getting stuck in this case.
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Acked-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave has reported the following lockdep splat:
=================================
[ INFO: inconsistent lock state ]
3.11.0-rc1+ #9 Not tainted
---------------------------------
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
kswapd0/49 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&mapping->i_mmap_mutex){+.+.?.}, at: [<c114971b>] page_referenced+0x87/0x5e3
{RECLAIM_FS-ON-W} state was registered at:
mark_held_locks+0x81/0xe7
lockdep_trace_alloc+0x5e/0xbc
__alloc_pages_nodemask+0x8b/0x9b6
__get_free_pages+0x20/0x31
get_zeroed_page+0x12/0x14
__pmd_alloc+0x1c/0x6b
huge_pmd_share+0x265/0x283
huge_pte_alloc+0x5d/0x71
hugetlb_fault+0x7c/0x64a
handle_mm_fault+0x255/0x299
__do_page_fault+0x142/0x55c
do_page_fault+0xd/0x16
error_code+0x6c/0x74
irq event stamp: 3136917
hardirqs last enabled at (3136917): _raw_spin_unlock_irq+0x27/0x50
hardirqs last disabled at (3136916): _raw_spin_lock_irq+0x15/0x78
softirqs last enabled at (3136180): __do_softirq+0x137/0x30f
softirqs last disabled at (3136175): irq_exit+0xa8/0xaa
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&mapping->i_mmap_mutex);
<Interrupt>
lock(&mapping->i_mmap_mutex);
*** DEADLOCK ***
no locks held by kswapd0/49.
stack backtrace:
CPU: 1 PID: 49 Comm: kswapd0 Not tainted 3.11.0-rc1+ #9
Hardware name: Dell Inc. Precision WorkStation 490 /0DT031, BIOS A08 04/25/2008
Call Trace:
dump_stack+0x4b/0x79
print_usage_bug+0x1d9/0x1e3
mark_lock+0x1e0/0x261
__lock_acquire+0x623/0x17f2
lock_acquire+0x7d/0x195
mutex_lock_nested+0x6c/0x3a7
page_referenced+0x87/0x5e3
shrink_page_list+0x3d9/0x947
shrink_inactive_list+0x155/0x4cb
shrink_lruvec+0x300/0x5ce
shrink_zone+0x53/0x14e
kswapd+0x517/0xa75
kthread+0xa8/0xaa
ret_from_kernel_thread+0x1b/0x28
which is a false positive caused by hugetlb pmd sharing code which
allocates a new pmd from withing mapping->i_mmap_mutex. If this
allocation causes reclaim then the lockdep detector complains that we
might self-deadlock.
This is not correct though, because hugetlb pages are not reclaimable so
their mapping will be never touched from the reclaim path.
The patch tells lockup detector that hugetlb i_mmap_mutex is special by
assigning it a separate lockdep class so it won't report possible
deadlocks on unrelated mappings.
[peterz@infradead.org: comment for annotation]
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a BUG which can trigger when direct-IO is used with AOE.
As discussed previously, the fact that some users of the block layer
provide bios that point to pages with a zero _count means that it is not
OK for the network layer to do a put_page on the skb frags during an
skb_linearize, so the aoe driver gets a reference to pages in bios and
puts the reference before ending the bio. And because it cannot use
get_page on a page with a zero _count, it manipulates the value
directly.
It is not OK to increment the _count of a compound page tail, though,
since the VM layer will VM_BUG_ON a non-zero _count. Block users that
do direct I/O can result in the aoe driver seeing compound page tails in
bios. In that case, the same logic works as long as the head of the
compound page is used instead of the tails. This patch handles compound
pages and does not BUG.
It relies on the block layer user leaving the relationship between the
page tail and its head alone for the duration between the submission of
the bio and its completion, whether successful or not.
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix inadvertent breakage in the clone syscall ABI for Microblaze that
was introduced in commit f3268edbe6 ("microblaze: switch to generic
fork/vfork/clone").
The Microblaze syscall ABI for clone takes the parent tid address in the
4th argument; the third argument slot is used for the stack size. The
incorrectly-used CLONE_BACKWARDS type assigned parent tid to the 3rd
slot.
This commit restores the original ABI so that existing userspace libc
code will work correctly.
All kernel versions from v3.8-rc1 were affected.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Lutomirski reported that if a page with _PAGE_SOFT_DIRTY bit set
get swapped out, the bit is getting lost and no longer available when
pte read back.
To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in
pte entry for the page being swapped out. When such page is to be read
back from a swap cache we check for bit presence and if it's there we
clear it and restore the former _PAGE_SOFT_DIRTY bit back.
One of the problem was to find a place in pte entry where we can save
the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The _PAGE_PSE was
chosen for that, it doesn't intersect with swap entry format stored in
pte.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
struct memcg_cache_params has a union. Different parts of this union
are used for root and non-root caches. A part with destroying work is
used only for non-root caches.
I fixed the same problem in another place v3.9-rc1-16204-gf101a94, but
didn't notice this one.
This patch fixes the kernel panic:
[ 46.848187] BUG: unable to handle kernel paging request at 000000fffffffeb8
[ 46.849026] IP: [<ffffffff811a484c>] kmem_cache_destroy_memcg_children+0x6c/0xc0
[ 46.849092] PGD 0
[ 46.849092] Oops: 0000 [#1] SMP
...
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: <stable@vger.kernel.org> [3.9.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull scheduler fixes from Ingo Molnar:
"Docbook fixes that make 99% of the diffstat, plus a oneliner fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Ensure update_cfs_shares() is called for parents of continuously-running tasks
sched: Fix some kernel-doc warnings
Pull perf fixes from Ingo Molnar:
"Two small fixlets"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Add Haswell ULT model number used in Macbook Air and other systems
perf/x86: Fix intel QPI uncore event definitions
Using inner-id for tunnel id is not safe in some rare cases.
E.g. packets coming from multiple sources entering same tunnel
can have same id. Therefore on tunnel packet receive we
could have packets from two different stream but with same
source and dst IP with same ip-id which could confuse ip packet
reassembly.
Following patch reverts optimization from commit
490ab08127 (IP_GRE: Fix IP-Identification.)
CC: Jarno Rajahalme <jrajahalme@nicira.com>
CC: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov says:
====================
Please consider applying the series of bnx2x fixes to net:
* statistics may cause FW assert
* missing fairness configuration in DCB flow
* memory leak in sriov related part
* Illegal PTE access
* Pagefault crash in shutdown flow with cnic
v1->v2
* fixed sparse error pointed by Joe Perches
* added missing signed-off from Sergei Shtylyov
v2->v3
* added missing signed-off from Sergei Shtylyov
* fixed formatting from Sergei Shtylyov
v3->v4
* patch 1/6: fixed declaration order
* patch 2/6 replaced with: protect flows using set_bit constraints
v4->v5
* patch 2/6: replace proprietary locking with semaphore
* droped 1/6: since adds redundant code from Benjamin Poirier
The following patchset contains four netfilter fixes, they are:
* Fix possible invalid access and mangling of the TCPMSS option in
xt_TCPMSS. This was spotted by Julian Anastasov.
* Fix possible off by one access and mangling of the TCP packet in
xt_TCPOPTSTRIP, also spotted by Julian Anastasov.
* Fix possible information leak due to missing initialization of one
padding field of several structures that are included in nfqueue and
nflog netlink messages, from Dan Carpenter.
* Fix TCP window tracking with Fast Open, from Yuchung Cheng.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There might be a crash as during shutdown flow CNIC might try
to access resources already freed by bnx2x.
Change bnx2x_close() into dev_close() in __bnx2x_remove (shutdown flow)
to guarantee CNIC is notified of the device's change of status.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PTE write access error might occur in MF_ALLOWED mode when IOMMU
is active. The patch adds rmmod HSI indicating to MFW to stop
running queries which might trigger this failure.
Signed-off-by: Barak Witkowsky <barak@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ETS can be enabled as a result of DCB negotiation, then
fairness must be recalculated after each negotiation.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add locking to protect different statistics flows from
running simultaneously.
This in order to serialize statistics requests sent to FW,
otherwise two outstanding queries may cause FW assert.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pm_qos_update_request_timeout() updates a qos and then schedules
a delayed work item to bring the qos back down to the default
after the timeout. When the work item runs, pm_qos_work_fn() will
call pm_qos_update_request() and deadlock because it tries to
cancel itself via cancel_delayed_work_sync(). Future callers of
that qos will also hang waiting to cancel the work that is
canceling itself. Let's extract the little bit of code that does
the real work of pm_qos_update_request() and call it from the
work function so that we don't deadlock.
Before ed1ac6e (PM: don't use [delayed_]work_pending()) this didn't
happen because the work function wouldn't try to cancel itself.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: 3.9+ <stable@vger.kernel.org> # 3.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove Andrew Gallatin, as he is no longer with Myricom. Add
Hyong-Youb Kim as the new maintainer. Update the website URL.
Signed-off-by: Hyong-Youb Kim <hykim@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DMA sync should sync the whole receive buffer, not just
part of it. Fixes log messages dma_sync_check.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>