Commit Graph

26497 Commits

Author SHA1 Message Date
Miklos Szeredi
bfe8684869 filesystems: add set_nlink()
Replace remaining direct i_nlink updates with a new set_nlink()
updater function.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-11-02 12:53:43 +01:00
Andy Whitcroft
1fa1e7f615 readlinkat: ensure we return ENOENT for the empty pathname for normal lookups
Since the commit below which added O_PATH support to the *at() calls, the
error return for readlink/readlinkat for the empty pathname has switched
from ENOENT to EINVAL:

  commit 65cfc67223
  Author: Al Viro <viro@zeniv.linux.org.uk>
  Date:   Sun Mar 13 15:56:26 2011 -0400

    readlinkat(), fchownat() and fstatat() with empty relative pathnames

This is both unexpected for userspace and makes readlink/readlinkat
inconsistant with all other interfaces; and inconsistant with our stated
return for these pathnames.

As the readlinkat call does not have a flags parameter we cannot use the
AT_EMPTY_PATH approach used in the other calls.  Therefore expose whether
the original path is infact entry via a new user_path_at_empty() path
lookup function.  Use this to determine whether to default to EINVAL or
ENOENT for failures.

Addresses http://bugs.launchpad.net/bugs/817187

[akpm@linux-foundation.org: remove unused getname_flags()]
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-11-02 12:53:42 +01:00
Linus Torvalds
367069f16e Merge branch 'next/dt' of git://git.linaro.org/people/arnd/arm-soc
* 'next/dt' of git://git.linaro.org/people/arnd/arm-soc:
  ARM: gic: use module.h instead of export.h
  ARM: gic: fix irq_alloc_descs handling for sparse irq
  ARM: gic: add OF based initialization
  ARM: gic: add irq_domain support
  irq: support domains with non-zero hwirq base
  of/irq: introduce of_irq_init
  ARM: at91: add at91sam9g20 and Calao USB A9G20 DT support
  ARM: at91: dt: at91sam9g45 family and board device tree files
  arm/mx5: add device tree support for imx51 babbage
  arm/mx5: add device tree support for imx53 boards
  ARM: msm: Add devicetree support for msm8660-surf
  msm_serial: Add devicetree support
  msm_serial: Use relative resources for iomem

Fix up conflicts in arch/arm/mach-at91/{at91sam9260.c,at91sam9g45.c}
2011-11-01 21:02:35 -07:00
Linus Torvalds
81a3c10ce8 Merge branch 'next/cleanup2' of git://git.linaro.org/people/arnd/arm-soc
* 'next/cleanup2' of git://git.linaro.org/people/arnd/arm-soc: (31 commits)
  ARM: OMAP: Warn if omap_ioremap is called before SoC detection
  ARM: OMAP: Move set_globals initialization to happen in init_early
  ARM: OMAP: Map SRAM later on with ioremap_exec()
  ARM: OMAP: Remove calls to SRAM allocations for framebuffer
  ARM: OMAP: Avoid cpu_is_omapxxxx usage until map_io is done
  ARM: OMAP1: Use generic map_io, init_early and init_irq
  arm/dts: OMAP3+: Add mpu, dsp and iva nodes
  arm/dts: OMAP4: Add a main ocp entry bound to l3-noc driver
  ARM: OMAP2+: l3-noc: Add support for device-tree
  ARM: OMAP2+: board-generic: Add i2c static init
  ARM: OMAP2+: board-generic: Add DT support to generic board
  arm/dts: Add support for OMAP3 Beagle board
  arm/dts: Add initial device tree support for OMAP3 SoC
  arm/dts: Add support for OMAP4 SDP board
  arm/dts: Add support for OMAP4 PandaBoard
  arm/dts: Add initial device tree support for OMAP4 SoC
  ARM: OMAP: omap_device: Add a method to build an omap_device from a DT node
  ARM: OMAP: omap_device: Add omap_device_[alloc|delete] for DT integration
  of: Add helpers to get one string in multiple strings property
  ARM: OMAP2+: devices: Remove all omap_device_pm_latency structures
  ...

Fix up trivial header file conflicts in arch/arm/mach-omap2/board-generic.c
2011-11-01 20:58:25 -07:00
Linus Torvalds
cd9a0b6bd6 Merge branch 'next/pm' of git://git.linaro.org/people/arnd/arm-soc
* 'next/pm' of git://git.linaro.org/people/arnd/arm-soc: (66 commits)
  ARM: CSR: PM: use outer_resume to resume L2 cache
  ARM: CSR: call l2x0_of_init to init L2 cache of SiRFprimaII
  ARM: OMAP: voltage: voltage layer present, even when CONFIG_PM=n
  ARM: CSR: PM: add sleep entry for SiRFprimaII
  ARM: CSR: PM: save/restore irq status in suspend cycle
  ARM: CSR: PM: save/restore timer status in suspend cycle
  OMAP4: PM: TWL6030: add cmd register
  OMAP4: PM: TWL6030: fix ON/RET/OFF voltages
  OMAP4: PM: TWL6030: address 0V conversions
  OMAP4: PM: TWL6030: fix uv to voltage for >0x39
  OMAP4: PM: TWL6030: fix voltage conversion formula
  omap: voltage: add a stub header file for external/regulator use
  OMAP2+: VC: more registers are per-channel starting with OMAP5
  OMAP3+: voltage: update nominal voltage in voltdm_scale() not VC post-scale
  OMAP3+: voltage: rename omap_voltage_get_nom_volt -> voltdm_get_voltage
  OMAP3+: voltdm: final removal of omap_vdd_info
  OMAP3+: voltage: move/rename curr_volt from vdd_info into struct voltagedomain
  OMAP3+: voltage: rename scale and reset functions using voltdm_ prefix
  OMAP3+: VP: combine setting init voltage into common function
  OMAP3+: VP: remove unused omap_vp_get_curr_volt()
  ...

Fix up trivial conflict in arch/arm/mach-prima2/l2x0.c (code removal vs
edit)
2011-11-01 20:22:01 -07:00
Linus Torvalds
ac5761a650 Merge branch 'next/timer' of git://git.linaro.org/people/arnd/arm-soc
* 'next/timer' of git://git.linaro.org/people/arnd/arm-soc:
  clocksource: fixup ux500 build problems
  ARM: omap: use __devexit_p in dmtimer driver
  ARM: ux500: Reprogram timers upon resume
  ARM: plat-nomadik: timer: Export reset functions
  ARM: plat-nomadik: timer: Add support for periodic timers
  ARM: ux500: Move timer code to separate file
  ARM: ux500: add support for clocksource DBX500 PRCMU
  clocksource: add DBX500 PRCMU Timer support
  ARM: plat-nomadik: MTU sched_clock as an option
  ARM: OMAP: dmtimer: add error handling to export APIs
  ARM: OMAP: dmtimer: low-power mode support
  ARM: OMAP: dmtimer: skip reserved timers
  ARM: OMAP: dmtimer: pm_runtime support
  ARM: OMAP: dmtimer: switch-over to platform device driver
  ARM: OMAP: dmtimer: platform driver
  ARM: OMAP2+: dmtimer: convert to platform devices
  ARM: OMAP1: dmtimer: conversion to platform devices
  ARM: OMAP2+: dmtimer: add device names to flck nodes
  ARM: OMAP: Add support for dmtimer v2 ip
2011-11-01 20:18:05 -07:00
Linus Torvalds
b4beb4bf99 Merge branch 'for-linus/i2c-3.2' of git://git.fluff.org/bjdooks/linux
* 'for-linus/i2c-3.2' of git://git.fluff.org/bjdooks/linux: (47 commits)
  i2c-s3c2410: Add device tree support
  i2c-s3c2410: Keep a copy of platform data and use it
  i2c-nomadik: cosmetic coding style corrections
  i2c-au1550: dev_pm_ops conversion
  i2c-au1550: increase timeout waiting for master done
  i2c-au1550: remove unused ack_timeout
  i2c-au1550: remove usage of volatile keyword
  i2c-tegra: __iomem annotation fix
  i2c-eg20t: Add initialize processing in case i2c-error occurs
  i2c-eg20t: Fix flag setting issue
  i2c-eg20t: add stop sequence in case wait-event timeout occurs
  i2c-eg20t: Separate error processing
  i2c-eg20t: Fix 10bit access issue
  i2c-eg20t: Modify returned value s32 to long
  i2c-eg20t: Fix bus-idle waiting issue
  i2c-designware: Fix PCI core warning on suspend/resume
  i2c-designware: Add runtime power management support
  i2c-designware: Add support for Designware core behind PCI devices.
  i2c-designware: Push all register reads/writes into the core code.
  i2c-designware: Support multiple cores using same ISR
  ...
2011-11-01 15:07:19 -07:00
Linus Torvalds
f3c3f06705 Merge branch 'for-linus' of git://opensource.wolfsonmicro.com/regulator
* 'for-linus' of git://opensource.wolfsonmicro.com/regulator: (22 commits)
  regulator: Constify constraints name
  regulator: Fix possible nullpointer dereference in regulator_enable()
  regulator: gpio-regulator add dependency on GENERIC_GPIO
  regulator: Add module.h include to gpio-regulator
  regulator: Add driver for gpio-controlled regulators
  regulator: remove duplicate REG_CTRL2 defines in tps65023
  regulator: Clarify documentation for regulator-regulator supplies
  regulator: Fix some bitrot in the machine driver documentation
  regulator: tps65023: Added support for the similiar TPS65020 chip
  regulator: tps65023: Setting correct core regulator for tps65021
  regulator: tps65023: Set missing bit for update core-voltage
  regulator: tps65023: Fixes i2c configuration issues
  regulator: Add debugfs file showing the supply map table
  regulator: tps6586x: add SMx slew rate setting
  regulator: tps65023: Fixes i2c configuration issues
  regulator: tps6507x: Remove num_voltages array
  regulator: max8952: removed unused mutex.
  regulator: fix regulator/consumer.h kernel-doc warning
  regulator: Ensure enough enable time for max8649
  regulator: 88pm8607: Fix off-by-one value range checking in the case of no id is matched
  ...
2011-11-01 15:06:20 -07:00
Linus Torvalds
1c39865151 Merge branch 'pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
* 'pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  pstore: make pstore write function return normal success/fail value
  pstore: change mutex locking to spin_locks
  pstore: defer inserting OOPS entries into pstore
2011-11-01 10:52:29 -07:00
Linus Torvalds
f470f8d4e7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits)
  mlx4_core: Deprecate log_num_vlan module param
  IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
  IB/mlx4: Enable 4K mtu for IBoE
  RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
  RDMA/cxgb4: Serialize calls to CQ's comp_handler
  RDMA/cxgb3: Serialize calls to CQ's comp_handler
  IB/qib: Fix issue with link states and QSFP cables
  IB/mlx4: Configure extended active speeds
  mlx4_core: Add extended port capabilities support
  IB/qib: Hold links until tuning data is available
  IB/qib: Clean up checkpatch issue
  IB/qib: Remove s_lock around header validation
  IB/qib: Precompute timeout jiffies to optimize latency
  IB/qib: Use RCU for qpn lookup
  IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
  IB/qib: Decode path MTU optimization
  IB/qib: Optimize RC/UC code by IB operation
  IPoIB: Use the right function to do DMA unmap pages
  RDMA/cxgb4: Use correct QID in insert_recv_cqe()
  RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
  ...
2011-11-01 10:51:38 -07:00
Roland Dreier
504255f8d0 Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'fdr', 'ipath', 'ipoib', 'misc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next 2011-11-01 09:37:08 -07:00
Linus Torvalds
dc47d3810c Merge git://github.com/herbertx/crypto
* git://github.com/herbertx/crypto: (48 commits)
  crypto: user - Depend on NET instead of selecting it
  crypto: user - Add dependency on NET
  crypto: talitos - handle descriptor not found in error path
  crypto: user - Initialise match in crypto_alg_match
  crypto: testmgr - add twofish tests
  crypto: testmgr - add blowfish test-vectors
  crypto: Make hifn_795x build depend on !ARCH_DMA_ADDR_T_64BIT
  crypto: twofish-x86_64-3way - fix ctr blocksize to 1
  crypto: blowfish-x86_64 - fix ctr blocksize to 1
  crypto: whirlpool - count rounds from 0
  crypto: Add userspace report for compress type algorithms
  crypto: Add userspace report for cipher type algorithms
  crypto: Add userspace report for rng type algorithms
  crypto: Add userspace report for pcompress type algorithms
  crypto: Add userspace report for nivaead type algorithms
  crypto: Add userspace report for aead type algorithms
  crypto: Add userspace report for givcipher type algorithms
  crypto: Add userspace report for ablkcipher type algorithms
  crypto: Add userspace report for blkcipher type algorithms
  crypto: Add userspace report for ahash type algorithms
  ...
2011-11-01 09:24:41 -07:00
Linus Torvalds
094803e0aa Merge branch 'akpm' (Andrew's incoming)
Quoth Andrew:

 - Most of MM.  Still waiting for the poweroc guys to get off their
   butts and review some threaded hugepages patches.

 - alpha

 - vfs bits

 - drivers/misc

 - a few core kerenl tweaks

 - printk() features

 - MAINTAINERS updates

 - backlight merge

 - leds merge

 - various lib/ updates

 - checkpatch updates

* akpm: (127 commits)
  epoll: fix spurious lockdep warnings
  checkpatch: add a --strict check for utf-8 in commit logs
  kernel.h/checkpatch: mark strict_strto<foo> and simple_strto<foo> as obsolete
  llist-return-whether-list-is-empty-before-adding-in-llist_add-fix
  wireless: at76c50x: follow rename pack_hex_byte to hex_byte_pack
  fat: follow rename pack_hex_byte() to hex_byte_pack()
  security: follow rename pack_hex_byte() to hex_byte_pack()
  kgdb: follow rename pack_hex_byte() to hex_byte_pack()
  lib: rename pack_hex_byte() to hex_byte_pack()
  lib/string.c: fix strim() semantics for strings that have only blanks
  lib/idr.c: fix comment for ida_get_new_above()
  lib/percpu_counter.c: enclose hotplug only variables in hotplug ifdef
  lib/bitmap.c: quiet sparse noise about address space
  lib/spinlock_debug.c: print owner on spinlock lockup
  lib/kstrtox: common code between kstrto*() and simple_strto*() functions
  drivers/leds/leds-lp5521.c: check if reset is successful
  leds: turn the blink_timer off before starting to blink
  leds: save the delay values after a successful call to blink_set()
  drivers/leds/leds-gpio.c: use gpio_get_value_cansleep() when initializing
  drivers/leds/leds-lm3530.c: add __devexit_p where needed
  ...
2011-10-31 17:46:07 -07:00
Joe Perches
67d0a07544 kernel.h/checkpatch: mark strict_strto<foo> and simple_strto<foo> as obsolete
Mark obsolete/deprecated strict_strto<foo> and simple_strto<foo> functions
and macros as obsolete.

Update checkpatch to warn about their use.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:57 -07:00
Andrew Morton
fc23af34b0 llist-return-whether-list-is-empty-before-adding-in-llist_add-fix
clarify comment

Cc: Huang Ying <ying.huang@intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:57 -07:00
Andy Shevchenko
55036ba76b lib: rename pack_hex_byte() to hex_byte_pack()
As suggested by Andrew Morton in [1] there is better to have most
significant part first in the function name.

[1] https://lkml.org/lkml/2011/9/20/22

There is no functional change.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Mimi Zohar <zohar@us.ibm.com>
Cc: James Morris <jmorris@namei.org>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:56 -07:00
Magnus Damm
2b67c95b74 drivers/leds/leds-renesas-tpu.c: move Renesas TPU LED driver platform data
Use the platform_data include directory for the TPU LED driver, as
suggested by Paul Mundt.

Signed-off-by: Magnus Damm <damm@opensource.se>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:55 -07:00
Magnus Damm
f59b6f9f32 leds: Renesas TPU LED driver
Add V2 of the LED driver for a single timer channel for the TPU hardware
block commonly found in Renesas SoCs.

The driver has been written with optimal Power Management in mind, so to
save power the LED is driven as a regular GPIO pin in case of maximum
brightness and power off which allows the TPU hardware to be idle and
which in turn allows the clocks to be stopped and the power domain to be
turned off transparently.

Any other brightness level requires use of the TPU hardware in PWM mode.
TPU hardware device clocks and power are managed through Runtime PM.
System suspend and resume is known to be working - during suspend the LED
is set to off by the generic LED code.

The TPU hardware timer is equipeed with a 16-bit counter together with an
up-to-divide-by-64 prescaler which makes the hardware suitable for
brightness control.  Hardware blink is unsupported.

The LED PWM waveform has been verified with a Fluke 123 Scope meter on a
sh7372 Mackerel board.  Tested with experimental sh7372 A3SP power domain
patches.  Platform device bind/unbind tested ok.

V2 has been tested on the DS2 LED of the sh73a0-based AG5EVM.

[axel.lin@gmail.com: include linux/module.h]
Signed-off-by: Magnus Damm <damm@opensource.se>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:54 -07:00
Mark Brown
0556dc340e backlight: fix broken regulator API usage in l4f00242t03
The regulator support in the l4f00242t03 is very non-idiomatic.  Rather
than requesting the regulators based on the device name and the supply
names used by the device the driver requires boards to pass system
specific supply names around through platform data.  The driver also
conditionally requests the regulators based on this platform data, adding
unneeded conditional code to the driver.

Fix this by removing the platform data and converting to the standard
idiom, also updating all in tree users of the driver.  As no datasheet
appears to be available for the LCD I'm guessing the names for the
supplies based on the existing users and I've no ability to do anything
more than compile test.

The use of regulator_set_voltage() in the driver is also problematic,
since fixed voltages are required the expectation would be that the
voltages would be fixed in the constraints set by the machines rather than
manually configured by the driver, but is less problematic.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:54 -07:00
Joe Perches
b9075fa968 treewide: use __printf not __attribute__((format(printf,...)))
Standardize the style for compiler based printf format verification.
Standardized the location of __printf too.

Done via script and a little typing.

$ grep -rPl --include=*.[ch] -w "__attribute__" * | \
  grep -vP "^(tools|scripts|include/linux/compiler-gcc.h)" | \
  xargs perl -n -i -e 'local $/; while (<>) { s/\b__attribute__\s*\(\s*\(\s*format\s*\(\s*printf\s*,\s*(.+)\s*,\s*(.+)\s*\)\s*\)\s*\)/__printf($1, $2)/g ; print; }'

[akpm@linux-foundation.org: revert arch bits]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:54 -07:00
Mark Brown
ec400c9fab lis3lv02d: make regulator API usage unconditional
The regulator API contains a range of features for stubbing itself out
when not in use and for transparently restricting the actual effect of
regulator API calls where they can't be supported on a particular system
so that drivers don't need to individually implement this.  Simplify the
driver slightly by making use of this idiom.

The only in tree user is ecovec24 which does not use the regulator API.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Éric Piel <eric.piel@tremplin-utc.net>
Cc: Ilkka Koskinen <ilkka.koskinen@nokia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:52 -07:00
Kyungmin Park
d43a87e68e mm: compaction: make compact_zone_order() static
There's no compact_zone_order() user outside file scope, so make it static.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:49 -07:00
Mikulas Patocka
09f363c736 vmscan: fix shrinker callback bug in fs/super.c
The callback must not return -1 when nr_to_scan is zero. Fix the bug in
fs/super.c and add this requirement to the callback specification.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:49 -07:00
Joe Perches
3ee9a4f086 mm: neaten warn_alloc_failed
Add __attribute__((format (printf...) to the function to validate format
and arguments.  Use vsprintf extension %pV to avoid any possible message
interleaving.  Coalesce format string.  Convert printks/pr_warning to
pr_warn.

[akpm@linux-foundation.org: use the __printf() macro]
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:48 -07:00
Andrea Arcangeli
37a1c49a91 thp: mremap support and TLB optimization
This adds THP support to mremap (decreases the number of split_huge_page()
calls).

Here are also some benchmarks with a proggy like this:

===
#define _GNU_SOURCE
#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/time.h>

#define SIZE (5UL*1024*1024*1024)

int main()
{
        static struct timeval oldstamp, newstamp;
	long diffsec;
	char *p, *p2, *p3, *p4;
	if (posix_memalign((void **)&p, 2*1024*1024, SIZE))
		perror("memalign"), exit(1);
	if (posix_memalign((void **)&p2, 2*1024*1024, SIZE))
		perror("memalign"), exit(1);
	if (posix_memalign((void **)&p3, 2*1024*1024, 4096))
		perror("memalign"), exit(1);

	memset(p, 0xff, SIZE);
	memset(p2, 0xff, SIZE);
	memset(p3, 0x77, 4096);
	gettimeofday(&oldstamp, NULL);
	p4 = mremap(p, SIZE, SIZE, MREMAP_FIXED|MREMAP_MAYMOVE, p3);
	gettimeofday(&newstamp, NULL);
	diffsec = newstamp.tv_sec - oldstamp.tv_sec;
	diffsec = newstamp.tv_usec - oldstamp.tv_usec + 1000000 * diffsec;
	printf("usec %ld\n", diffsec);
	if (p == MAP_FAILED || p4 != p3)
	//if (p == MAP_FAILED)
		perror("mremap"), exit(1);
	if (memcmp(p4, p2, SIZE))
		printf("mremap bug\n"), exit(1);
	printf("ok\n");

	return 0;
}
===

THP on

 Performance counter stats for './largepage13' (3 runs):

          69195836 dTLB-loads                 ( +-   3.546% )  (scaled from 50.30%)
             60708 dTLB-load-misses           ( +-  11.776% )  (scaled from 52.62%)
         676266476 dTLB-stores                ( +-   5.654% )  (scaled from 69.54%)
             29856 dTLB-store-misses          ( +-   4.081% )  (scaled from 89.22%)
        1055848782 iTLB-loads                 ( +-   4.526% )  (scaled from 80.18%)
              8689 iTLB-load-misses           ( +-   2.987% )  (scaled from 58.20%)

        7.314454164  seconds time elapsed   ( +-   0.023% )

THP off

 Performance counter stats for './largepage13' (3 runs):

        1967379311 dTLB-loads                 ( +-   0.506% )  (scaled from 60.59%)
           9238687 dTLB-load-misses           ( +-  22.547% )  (scaled from 61.87%)
        2014239444 dTLB-stores                ( +-   0.692% )  (scaled from 60.40%)
           3312335 dTLB-store-misses          ( +-   7.304% )  (scaled from 67.60%)
        6764372065 iTLB-loads                 ( +-   0.925% )  (scaled from 79.00%)
              8202 iTLB-load-misses           ( +-   0.475% )  (scaled from 70.55%)

        9.693655243  seconds time elapsed   ( +-   0.069% )

grep thp /proc/vmstat
thp_fault_alloc 35849
thp_fault_fallback 0
thp_collapse_alloc 3
thp_collapse_alloc_failed 0
thp_split 0

thp_split 0 confirms no thp split despite plenty of hugepages allocated.

The measurement of only the mremap time (so excluding the 3 long
memset and final long 10GB memory accessing memcmp):

THP on

usec 14824
usec 14862
usec 14859

THP off

usec 256416
usec 255981
usec 255847

With an older kernel without the mremap optimizations (the below patch
optimizes the non THP version too).

THP on

usec 392107
usec 390237
usec 404124

THP off

usec 444294
usec 445237
usec 445820

I guess with a threaded program that sends more IPI on large SMP it'd
create an even larger difference.

All debug options are off except DEBUG_VM to avoid skewing the
results.

The only problem for native 2M mremap like it happens above both the
source and destination address must be 2M aligned or the hugepmd can't be
moved without a split but that is an hardware limitation.

[akpm@linux-foundation.org: coding-style nitpicking]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:48 -07:00