The Committed_AS field can underflow in certain situations:
> # while true; do cat /proc/meminfo | grep _AS; sleep 1; done | uniq -c
> 1 Committed_AS: 18446744073709323392 kB
> 11 Committed_AS: 18446744073709455488 kB
> 6 Committed_AS: 35136 kB
> 5 Committed_AS: 18446744073709454400 kB
> 7 Committed_AS: 35904 kB
> 3 Committed_AS: 18446744073709453248 kB
> 2 Committed_AS: 34752 kB
> 9 Committed_AS: 18446744073709453248 kB
> 8 Committed_AS: 34752 kB
> 3 Committed_AS: 18446744073709320960 kB
> 7 Committed_AS: 18446744073709454080 kB
> 3 Committed_AS: 18446744073709320960 kB
> 5 Committed_AS: 18446744073709454080 kB
> 6 Committed_AS: 18446744073709320960 kB
Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does
not check for underflow.
But NR_CPUS proportional isn't good calculation. In general,
possibility of lock contention is proportional to the number of online
cpus, not theorical maximum cpus (NR_CPUS).
The current kernel has generic percpu-counter stuff. using it is right
way. it makes code simplify and percpu_counter_read_positive() don't
make underflow issue.
Reported-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@kernel.org> [All kernel versions]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some drivers using of_register_platform_driver() wrapper break on sparc
because the wrapper isn't in the header file. This patch moves it from
Microblaze and PowerPC implementations and makes it common code.
Fixes this sparc64 allmodconfig build error (at least):
drivers/leds/leds-gpio.c: In function `gpio_led_init':
drivers/leds/leds-gpio.c:295: error: implicit declaration of function `of_register_platform_driver'
drivers/leds/leds-gpio.c: In function `gpio_led_exit':
drivers/leds/leds-gpio.c:311: error: implicit declaration of function `of_unregister_platform_driver'
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes the problem introduced by commit 3bfacef412 (get rid of
special-casing the /sbin/loader on alpha): osf/1 ecoff binary segfaults
when binfmt_aout built as module. That happens because aout binary
handler gets on the top of the binfmt list due to late registration, and
kernel attempts to execute the binary without preparatory work that must
be done by binfmt_loader.
Fixed by changing the registration order of the default binfmt handlers
using list_add_tail() and introducing insert_binfmt() function which
places new handler on the top of the binfmt list. This might be generally
useful for installing arch-specific frontends for default handlers or just
for overriding them.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Richard Henderson <rth@twiddle.net
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (24 commits)
e100: do not go D3 in shutdown unless system is powering off
netfilter: revised locking for x_tables
Bluetooth: Fix connection establishment with low security requirement
Bluetooth: Add different pairing timeout for Legacy Pairing
Bluetooth: Ensure that HCI sysfs add/del is preempt safe
net: Avoid extra wakeups of threads blocked in wait_for_packet()
net: Fix typo in net_device_ops description.
ipv4: Limit size of route cache hash table
Add reference to CAPI 2.0 standard
Documentation/isdn/INTERFACE.CAPI
update Documentation/isdn/00-INDEX
ixgbe: Fix WoL functionality for 82599 KX4 devices
veth: prevent oops caused by netdev destructor
xfrm: wrong hash value for temporary SA
forcedeth: tx timeout fix
net: Fix LL_MAX_HEADER for CONFIG_TR_MODULE
mlx4_en: Handle page allocation failure during receive
mlx4_en: Fix cleanup flow on cq activation
vlan: update vlan carrier state for admin up/down
netfilter: xt_recent: fix stack overread in compat code
...
The x_tables are organized with a table structure and a per-cpu copies
of the counters and rules. On older kernels there was a reader/writer
lock per table which was a performance bottleneck. In 2.6.30-rc, this
was converted to use RCU and the counters/rules which solved the performance
problems for do_table but made replacing rules much slower because of
the necessary RCU grace period.
This version uses a per-cpu set of spinlocks and counters to allow to
table processing to proceed without the cache thrashing of a global
reader lock and keeps the same performance for table updates.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In 2.6.25 we added UDP mem accounting.
This unfortunatly added a penalty when a frame is transmitted, since
we have at TX completion time to call sock_wfree() to perform necessary
memory accounting. This calls sock_def_write_space() and utimately
scheduler if any thread is waiting on the socket.
Thread(s) waiting for an incoming frame was scheduled, then had to sleep
again as event was meaningless.
(All threads waiting on a socket are using same sk_sleep anchor)
This adds lot of extra wakeups and increases latencies, as noted
by Christoph Lameter, and slows down softirq handler.
Reference : http://marc.info/?l=linux-netdev&m=124060437012283&w=2
Fortunatly, Davide Libenzi recently added concept of keyed wakeups
into kernel, and particularly for sockets (see commit
37e5540b3c
epoll keyed wakeups: make sockets use keyed wakeups)
Davide goal was to optimize epoll, but this new wakeup infrastructure
can help non epoll users as well, if they care to setup an appropriate
handler.
This patch introduces new DEFINE_WAIT_FUNC() helper and uses it
in wait_for_packet(), so that only relevant event can wakeup a thread
blocked in this function.
Trace of function calls from bnx2 TX completion bnx2_poll_work() is :
__kfree_skb()
skb_release_head_state()
sock_wfree()
sock_def_write_space()
__wake_up_sync_key()
__wake_up_common()
receiver_wake_function() : Stops here since thread is waiting for an INPUT
Reported-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The old refok sections
.text.init.refok
.data.init.refok
.exit.text.refok
have been deprecated since commit
312b1485fb. After the other patches in
this patch series nothing is put in these sections, so clean things up
by eliminating all the remaining references to them.
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: only save/restore existent registers in the PCIe capability
x86/PCI: don't bother with root quirks if _CRS is used
docbooks: add/fix PCI kernel-doc
PCI: cleanup debug output resources
x86/PCI: set_pci_bus_resources_arch_default cleanups
x86/PCI: Move set_pci_bus_resources_arch_default into arch/x86
x86/PCI: don't call e820_all_mapped with -1 in the mmconfig case
PCI quirk: disable MSI on VIA VT3364 chipsets
This patch is preparation for replacing all uses of ".head.text" or
".text.head" in the kernel with macros, so that the section name can
later be changed without having to touch a lot of the kernel.
Since some linker scripts do more complex things than referencing
HEAD_TEXT, we add a HEAD_TEXT_SECTION macro that just contains the
actual name.
I've defined HEAD_TEXT_SECTION in a new header,
include/linux/section-names.h, so that this section name only needs to
appear in one place. I anticipate creating similar macro structures
for a number of other section names.
The long-term goal here is to be able to change the kernel's magic
section names to those that are compatible with -ffunction-sections
-fdata-sections. This requires renaming all magic sections with names
of the form ".text.foo".
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (34 commits)
ACPI, i915: Register ACPI video even when not modesetting
Revert "ACPICA: delete check for AML access to port 0x81-83"
I/O port protection: update for windows compatibility.
sony-laptop: always try to unblock rfkill on load
sony-laptop: fix bogus error message display on resume
ACPI: EC: Fix ACPI EC resume non-query interrupt message
sony-laptop: SNC input event 38 fix
sony-laptop: SNC 127 Initialization Fix
sony-laptop: Duplicate SNC 127 Event Fix
ACPI: prevent processor.max_cstate=0 boot crash
ACPI/hpet: prevent boot hang when hpet=force used on ICH-4M
ACPI: delete obsolete "bus master activity" proc field
ACPI: idle: mark_tsc_unstable() at init-time, not run-time
ACPI: add /sys/firmware/acpi/interrupts/sci_not counter
ACPI video: fix an error when the brightness levels on AC and on Battery are same
acpi-cpufreq: Do not let get_measured perf depend on internal variable
acpi-cpufreq: style-only: add parens to math expression
acpi-cpufreq: Cleanup: Use printk_once
x86, acpi_cpufreq: Fix the NULL pointer dereference in get_measured_perf
thinkpad-acpi: bump up version to 0.23
...
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Fix potential inode allocation soft lockup in Orlov allocator
ext4: Make the extent validity check more paranoid
jbd: use SWRITE_SYNC_PLUG when writing synchronous revoke records
jbd2: use SWRITE_SYNC_PLUG when writing synchronous revoke records
ext4: really print the find_group_flex fallback warning only once
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
USB: pwc : do not pass stack allocated buffers to USB core.
USB: otg: Fix bug on remove path without transceiver
USB: correct error handling in cdc-wdm
USB: removal of tty->low_latency hack dating back to the old serial code
USB: serial: sierra driver bug fix for composite interface
USB: gadget: omap_udc uses platform_driver_probe()
USB: ci13xxx_udc: fix build error
USB: musb: Prevent multiple includes of musb.h
USB: pass mem_flags to dma_alloc_coherent
USB: g_file_storage: fix use-after-free bug when closing files
USB: ehci-sched.c: EHCI SITD scheduling bugfix
USB: fix mos7840 problem with minor numbers
USB: mos7840: add new device id
USB: musb: fix build when !CONFIG_PM
USB: musb: Remove my email address from few musb related drivers
USB: Gadget: MIPS CI13xxx UDC bugfixes
USB: Unusual Device support for Gold MP3 Player Energy
USB: serial: fix lifetime and locking problems
This patch adds missing role attribute to the DCCP type, otherwise
the creation of entries is not of any use.
The attribute added is CTA_PROTOINFO_DCCP_ROLE which contains the
role of the conntrack original tuple.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
cfq-iosched: cache prio_tree root in cfqq->p_root
cfq-iosched: fix bug with aliased request and cooperation detection
cfq-iosched: clear ->prio_trees[] on cfqd alloc
block: fix intermittent dm timeout based oops
umem: fix request_queue lock warning
block: simplify I/O stat accounting
pktcdvd.h should include mempool.h
cfq-iosched: use the default seek distance when there aren't enough seek samples
cfq-iosched: make seek_mean converge more quickly
block: make blk_abort_queue() ignore non-request based devices
block: include empty disks in /proc/diskstats
bio: use bio_kmalloc() in copy/map functions
bio: fix bio_kmalloc()
block: fix queue bounce limit setting
block: fix SG_IO vector request data length handling
scatterlist: make sure sg_miter_next() doesn't return 0 sized mappings
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
powerpc: Fix modular build of ide-pmac when mediabay is built in
powerpc/pasemi: Fix build error on UP
powerpc: Make macintosh/mediabay driver depend on CONFIG_BLOCK
maintainers: Fix PS3 patterns
powerpc/ps3: Fix CONFIG_PS3_FLASH=n build warning
powerpc/32: Don't clobber personality flags on exec
powerpc: Fix crash on CPU hotplug
powerpc/85xx: Remove defconfigs that mpc85xx_{smp_}defconfig cover
powerpc/85xx: Added SMP defconfig
powerpc/85xx: Enabled a bunch of FSL specific drivers/options
powerpc/85xx: Updated generic mpc85xx_defconfig
powerpc: don't disable SATA interrupts on Freescale MPC8610 HPCD
fsl_rio: Pass the proper device to dma mapping routines
powerpc: Fix of_node_put() exit path in of_irq_map_one()
powerpc/5200: defconfig updates
powerpc/5200: Add FLASH nodes to lite5200 device tree
powerpc/device-tree: Document MTD nodes with multiple "reg" tuples
powerpc/of-device-tree: Factor MTD physmap bindings out of booting-without-of
powerpc/5200: Bring the legacy fsl_spi_platform_data hooks back
This simplifies I/O stat accounting switching code and separates it
completely from I/O scheduler switch code.
Requests are accounted according to the state of their request queue
at the time of the request allocation. There is no need anymore to
flush the request queue when switching I/O accounting state.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Fix this build error:
In file included from fs/compat_ioctl.c:104:
include/linux/pktcdvd.h:285: error: expected specifier-qualifier-list before 'mempool_t'
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>