commit fce6647757 upstream.
Current mem_cgroup_force_empty() only ensures mem->res.usage == 0 on
success. But this doesn't guarantee memcg's LRU is really empty, because
there are some cases in which !PageCgrupUsed pages exist on memcg's LRU.
For example:
- Pages can be uncharged by its owner process while they are on LRU.
- race between mem_cgroup_add_lru_list() and __mem_cgroup_uncharge_common().
So there can be a case in which the usage is zero but some of the LRUs are not empty.
OTOH, mem_cgroup_del_lru_list(), which can be called asynchronously with
rmdir, accesses the mem_cgroup, so this access can cause a problem if it
races with rmdir because the mem_cgroup might have been freed by rmdir.
Actually, I saw a bug which seems to be caused by this race.
[1530745.949906] BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
[1530745.950651] IP: [<ffffffff810fbc11>] mem_cgroup_del_lru_list+0x30/0x80
[1530745.950651] PGD 3863de067 PUD 3862c7067 PMD 0
[1530745.950651] Oops: 0002 [#1] SMP
[1530745.950651] last sysfs file: /sys/devices/system/cpu/cpu7/cache/index1/shared_cpu_map
[1530745.950651] CPU 3
[1530745.950651] Modules linked in: configs ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp nfsd nfs_acl auth_rpcgss exportfs autofs4 hidp rfcomm l2cap crc16 bluetooth lockd sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_multipath scsi_dh video output sbs sbshc battery ac lp kvm_intel kvm sg ide_cd_mod cdrom serio_raw tpm_tis tpm tpm_bios acpi_memhotplug button parport_pc parport rtc_cmos rtc_core rtc_lib e1000 i2c_i801 i2c_core pcspkr dm_region_hash dm_log dm_mod ata_piix libata shpchp megaraid_mbox sd_mod scsi_mod megaraid_mm ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: freq_table]
[1530745.950651] Pid: 19653, comm: shmem_test_02 Tainted: G M 2.6.32-mm1-00701-g2b04386 #3 Express5800/140Rd-4 [N8100-1065]
[1530745.950651] RIP: 0010:[<ffffffff810fbc11>] [<ffffffff810fbc11>] mem_cgroup_del_lru_list+0x30/0x80
[1530745.950651] RSP: 0018:ffff8803863ddcb8 EFLAGS: 00010002
[1530745.950651] RAX: 00000000000001e0 RBX: ffff8803abc02238 RCX: 00000000000001e0
[1530745.950651] RDX: 0000000000000000 RSI: ffff88038611a000 RDI: ffff8803abc02238
[1530745.950651] RBP: ffff8803863ddcc8 R08: 0000000000000002 R09: ffff8803a04c8643
[1530745.950651] R10: 0000000000000000 R11: ffffffff810c7333 R12: 0000000000000000
[1530745.950651] R13: ffff880000017f00 R14: 0000000000000092 R15: ffff8800179d0310
[1530745.950651] FS: 0000000000000000(0000) GS:ffff880017800000(0000) knlGS:0000000000000000
[1530745.950651] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[1530745.950651] CR2: 0000000000000230 CR3: 0000000379d87000 CR4: 00000000000006e0
[1530745.950651] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1530745.950651] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[1530745.950651] Process shmem_test_02 (pid: 19653, threadinfo ffff8803863dc000, task ffff88038612a8a0)
[1530745.950651] Stack:
[1530745.950651] ffffea00040c2fe8 0000000000000000 ffff8803863ddd98 ffffffff810c739a
[1530745.950651] <0> 00000000863ddd18 000000000000000c 0000000000000000 0000000000000000
[1530745.950651] <0> 0000000000000002 0000000000000000 ffff8803863ddd68 0000000000000046
[1530745.950651] Call Trace:
[1530745.950651] [<ffffffff810c739a>] release_pages+0x142/0x1e7
[1530745.950651] [<ffffffff810c778f>] ? pagevec_move_tail+0x6e/0x112
[1530745.950651] [<ffffffff810c781e>] pagevec_move_tail+0xfd/0x112
[1530745.950651] [<ffffffff810c78a9>] lru_add_drain+0x76/0x94
[1530745.950651] [<ffffffff810dba0c>] exit_mmap+0x6e/0x145
[1530745.950651] [<ffffffff8103f52d>] mmput+0x5e/0xcf
[1530745.950651] [<ffffffff81043ea8>] exit_mm+0x11c/0x129
[1530745.950651] [<ffffffff8108fb29>] ? audit_free+0x196/0x1c9
[1530745.950651] [<ffffffff81045353>] do_exit+0x1f5/0x6b7
[1530745.950651] [<ffffffff8106133f>] ? up_read+0x2b/0x2f
[1530745.950651] [<ffffffff8137d187>] ? lockdep_sys_exit_thunk+0x35/0x67
[1530745.950651] [<ffffffff81045898>] do_group_exit+0x83/0xb0
[1530745.950651] [<ffffffff810458dc>] sys_exit_group+0x17/0x1b
[1530745.950651] [<ffffffff81002c1b>] system_call_fastpath+0x16/0x1b
[1530745.950651] Code: 54 53 0f 1f 44 00 00 83 3d cc 29 7c 00 00 41 89 f4 75 63 eb 4e 48 83 7b 08 00 75 04 0f 0b eb fe 48 89 df e8 18 f3 ff ff 44 89 e2 <48> ff 4c d0 50 48 8b 05 2b 2d 7c 00 48 39 43 08 74 39 48 8b 4b
[1530745.950651] RIP [<ffffffff810fbc11>] mem_cgroup_del_lru_list+0x30/0x80
[1530745.950651] RSP <ffff8803863ddcb8>
[1530745.950651] CR2: 0000000000000230
[1530745.950651] ---[ end trace c3419c1bb8acc34f ]---
[1530745.950651] Fixing recursive fault but reboot is needed!
The problem here is pages on LRU may contain pointer to stale memcg. To
make res->usage to be 0, all pages on memcg must be uncharged or moved to
another(parent) memcg. Moved page_cgroup have already removed from
original LRU, but uncharged page_cgroup contains pointer to memcg withou
PCG_USED bit. (This asynchronous LRU work is for improving performance.)
If PCG_USED bit is not set, page_cgroup will never be added to memcg's
LRU. So, about pages not on LRU, they never access stale pointer. Then,
what we have to take care of is page_cgroup _on_ LRU list. This patch
fixes this problem by making mem_cgroup_force_empty() visit all LRUs
before exiting its loop and guarantee there are no pages on its LRU.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit eb29a5cc0b upstream.
Fix divide by zero and broken output. Commit 600ce1a0fa ("fix clock
setting for Samsung SoC Framebuffer") introduced a mandatory refresh
parameter to the platform data for the S3C framebuffer but did not
introduce any validation code, causing existing platforms (none of which
have refresh set) to divide by zero whenever the framebuffer is
configured, generating warnings and unusable output.
Ben Dooks noted several problems with the patch:
- The platform data supplies the pixclk directly and should already
have taken care of the refresh rate.
- The addition of a window ID parameter doesn't help since only the
root framebuffer can control the pixclk.
- pixclk is specified in picoseconds (rather than Hz) as the patch
assumed.
and suggests reverting the commit so do that. Without fixing this no
mainline user of the driver will produce output.
[akpm@linux-foundation.org: don't revert the correct bit]
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: InKi Dae <inki.dae@samsung.com>
Cc: Kyungmin Park <kmpark@infradead.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 976ae32be4 upstream.
inotify will WARN() if it finds that the idr and the fsnotify internals
somehow got out of sync. It was only supposed to do this once but due
to this stupid bug it would warn every single time a problem was
detected.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 9e572cc987 upstream.
Since commit 7e790dd5fc ("inotify: fix
error paths in inotify_update_watch") inotify changed the manor in which
it gave watch descriptors back to userspace. Previous to this commit
inotify acted like the following:
inotify_add_watch(X, Y, Z) = 1
inotify_rm_watch(X, 1);
inotify_add_watch(X, Y, Z) = 2
but after this patch inotify would return watch descriptors like so:
inotify_add_watch(X, Y, Z) = 1
inotify_rm_watch(X, 1);
inotify_add_watch(X, Y, Z) = 1
which I saw as equivalent to opening an fd where
open(file) = 1;
close(1);
open(file) = 1;
seemed perfectly reasonable. The issue is that quite a bit of userspace
apparently relies on the behavior in which watch descriptors will not be
quickly reused. KDE relies on it, I know some selinux packages rely on
it, and I have heard complaints from other random sources such as debian
bug 558981.
Although the man page implies what we do is ok, we broke userspace so
this patch almost reverts us to the old behavior. It is still slightly
racey and I have patches that would fix that, but they are rather large
and this will fix it for all real world cases. The race is as follows:
- task1 creates a watch and blocks in idr_new_watch() before it updates
the hint.
- task2 creates a watch and updates the hint.
- task1 updates the hint with it's older wd
- task removes the watch created by task2
- task adds a new watch and will reuse the wd originally given to task2
it requires moving some locking around the hint (last_wd) but this should
solve it for the real world and be -stable safe.
As a side effect this patch papers over a bug in the lib/idr code which
is causing a large number WARN's to pop on people's system and many
reports in kerneloops.org. I'm working on the root cause of that idr
bug seperately but this should make inotify immune to that issue.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit fc61901373 upstream.
Some BIOSes fail to initialise the GTT, which will cause DMA faults when
the IOMMU is enabled. We need to clear the whole thing to point at the
scratch page, not just the part that Linux is going to use.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
[anholt: Note that this may also help with stability in the presence of
driver bugs, by not drawing to memory we don't own]
Signed-off-by: Eric Anholt <eric@anholt.net>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 2570a4f542 upstream.
This fixes CERT-FI FICORA #341748
Discovered by Olli Jarva and Tuomo Untinen from the CROSS
project at Codenomicon Ltd.
Just like in CVE-2007-4567, we can't rely upon skb_dst() being
non-NULL at this point. We fixed that in commit
e76b2b2567 ("[IPV6]: Do no rely on
skb->dst before it is assigned.")
However commit 483a47d2fe ("ipv6: added
net argument to IP6_INC_STATS_BH") put a new version of the same bug
into this function.
Complicating analysis further, this bug can only trigger when network
namespaces are enabled in the build. When namespaces are turned off,
the dev_net() does not evaluate it's argument, so the dereference
would not occur.
So, for a long time, namespaces couldn't be turned on unless SYSFS was
disabled. Therefore, this code has largely been disabled except by
people turning it on explicitly for namespace development.
With help from Eugene Teo <eugene@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit b4c30aad39 upstream.
Several leaks in audit_tree didn't get caught by commit
318b6d3d7d, including the leak on normal
exit in case of multiple rules refering to the same chunk.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 6f5d511489 upstream.
... aka "Al had badly fscked up when writing that thing and nobody
noticed until Eric had fixed leaks that used to mask the breakage".
The function essentially creates a copy of old array sans one element
and replaces the references to elements of original (they are on cyclic
lists) with those to corresponding elements of new one. After that the
old one is fair game for freeing.
First of all, there's a dumb braino: when we get to list_replace_init we
use indices for wrong arrays - position in new one with the old array
and vice versa.
Another bug is more subtle - termination condition is wrong if the
element to be excluded happens to be the last one. We shouldn't go
until we fill the new array, we should go until we'd finished the old
one. Otherwise the element we are trying to kill will remain on the
cyclic lists...
That crap used to be masked by several leaks, so it was not quite
trivial to hit. Eric had fixed some of those leaks a while ago and the
shit had hit the fan...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is a backport of the mainline patches
cf0277e714045cfb71a3b49bb574e4
Here is the description of the first of
those patches (the other two just fixed
bugs added by that patch):
Since I removed the master netdev, we've been
keeping internal queues only, and even before
that we never told the networking stack above
the virtual interfaces about congestion. This
means that packets are queued in mac80211 and
the upper layers never know, possibly leading
to memory exhaustion and other problems.
This patch makes all interfaces multiqueue and
uses ndo_select_queue to put the packets into
queues per AC. Additionally, when the driver
stops a queue, we now stop all corresponding
queues for the virtual interfaces as well.
The injection case will use VO by default for
non-data frames, and BE for data frames, but
downgrade any data frames according to ACM. It
needs to be fleshed out in the future to allow
chosing the queue/AC in radiotap.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Stable commit 0399123f3d didn't match the
original upstream commit. The CONFIG_MMU check was added much too early
in the list disabling a lot of proc entries in the process.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit cda9d05c49 upstream.
This code generally fails to adjust the render clock, and when it does,
it conflicts with some other register settings and can cause problems.
So remove this code altogether. I'm reworking it now to do the right
thing, but the only bit it will share is the VBT check for whether
reclocking is supported, so I'm leaving that bit.
Reverts most of 652c393a33 ("add dynamic
clock frequency control"), though for many the regressions showed up
in the later 181a5336d6 ("Fix render
reclock availability detection").
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit d790744880 upstream.
Various missing sanity checks caused rejected action frames to be
interpreted as channel switch announcements, which can cause a client
mode interface to switch away from its operating channel, thereby losing
connectivity. This patch ensures that only spectrum management action
frames are processed by the CSA handling function and prevents rejected
action frames from getting processed by the MLME code.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 8a9ac160e8 upstream.
tid is used as an array offset.
agg = &priv->stations[sta_id].tid[tid].agg;
iwl4965_tx_status_reply_tx(priv, agg, tx_resp, txq_id, index);
It should be limitted to MAX_TID_COUNT - 1;
struct iwl_tid_data tid[MAX_TID_COUNT];
regards,
dan carpenter
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e12822e1d3 upstream.
This fixes a syntax error when setting up the user regulatory
hint. This change yields the same exact binary object though
so it ends up just being a syntax typo fix, fortunately.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 359207c687 upstream.
Commit 8bf3d79bc4 enabled EEPROM
checksum checks to avoid bogus bug reports but failed to address
updating the code to consider devices with custom EEPROM sizes.
Devices with custom sized EEPROMs have the upper limit size stuffed
in the EEPROM. Use this as the upper limit instead of the static
default size. In case of a checksum error also provide back the
max size and whether or not this was the default size or a custom
one. If the EEPROM is busted we add a failsafe check to ensure
we don't loop forever or try to read bogus areas of hardware.
This closes bug 14874
http://bugzilla.kernel.org/show_bug.cgi?id=14874
Cc: David Quan <david.quan@atheros.com>
Cc: Stephen Beahm <stephenbeahm@comcast.net>
Reported-by: Joshua Covington <joshuacov@googlemail.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit c5cae661d6 upstream.
In 65f63384 "xen: improve error handling in do_suspend" I said:
- xs_suspend()/xs_resume() and dpm_suspend_noirq()/dpm_resume_noirq() were not
nested in the obvious way.
and changed the ordering of the calls as so:
BEFORE AFTER
xs_suspend dpm_suspend_noirq
dpm_suspend_noirq xs_suspend
*SUSPEND* *SUSPEND*
dpm_resume_noirq dpm_resume_noirq
xs_resume xs_resume
Clearly this is not an improvement and I was talking rubbish.
In particular the new ordering is susceptible to a hang if a xenstore write is
in progress at the point at which the suspend kicks in. When the suspend
process calls xs_suspend it tries to take the request_mutex but if a write is
in progress it could be looping in xenbus_xs.c:read_reply() waiting for
something to arrive on &xs_state.reply_list while holding the request_mutex
(taken in the caller of read_reply).
However if we have done dpm_suspend_noirq before xs_suspend then we won't get
any more xenstore interrupts and process_msg() will never be woken up to add
anything to the reply_list.
Fix this by calling xs_suspend before dpm_suspend_noirq. If dpm_suspend_noirq
fails then make sure we go through the xs_suspend_cancel() code path.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 05b5d89823 upstream.
Commit fd8fbfc1 modified the way we find amount of reserved space
belonging to an inode. The amount of reserved space is checked
from dquot_transfer and thus inode_reserved_space gets called
even for filesystems that don't provide get_reserved_space callback
which results in a BUG.
Fix the problem by checking get_reserved_space callback and return 0 if
the filesystem does not provide it.
CC: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit bb595c923b upstream.
The ADT7462_PIN28_VOLT value is a 4-bit field, so the corresponding
shift must be 4.
Signed-off-by: Roger Blofeld <blofeldus@yahoo.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 1fe63ab47a upstream.
The max junction temperature of Atom N450/D410/D510 CPUs is 100 degrees
Celsius. Since these CPUs are always coupled with Intel NM10 chipset in
one package, the best way to verify whether an Atom CPU is N450/D410/D510
is to check the host bridge device.
Signed-off-by: Yong Wang <yong.y.wang@intel.com>
Acked-by: Huaxu Wan <huaxu.wan@intel.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit aaff23a95a upstream.
As noticed by Dan Carpenter <error27@gmail.com>, update_nl_seq()
currently contains an out of bounds read of the seq_aft_nl array
when looking for the oldest sequence number position.
Fix it to only compare valid positions.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit dce766af54 upstream.
normal users are currently allowed to set/modify ebtables rules.
Restrict it to processes with CAP_NET_ADMIN.
Note that this cannot be reproduced with unmodified ebtables binary
because it uses SOCK_RAW.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>