Commit Graph

31006 Commits

Author SHA1 Message Date
Dmitry Torokhov
7774036808 Merge branch 'for-next' of git://github.com/rydberg/linux into next
Merge Henrik's updates to multitouch code. Even though Jiri already
pulled them in I need to do it too since my changes to evdev using
dynamic major would clash with them.
2012-10-01 14:40:51 -07:00
Henrik Rydberg
7e55bdedfa HID: Allow more fields in the hid report
Some recent hardware define more than 128 fields in the report
descriptor. Increase the limit to 256. This adds another kilobyte of
memory per report.

Tested-by: Ping Cheng <pingc@wacom.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
2012-09-19 19:50:20 +02:00
Henrik Rydberg
9ebf3d7687 HID: Add an input configured notification callback
A hid device may create several input devices, and a driver may need
to prepare or finalize the configuration per input device. Currently,
there is no sane way for a driver to know when a device has been
configured. This patch adds a callback providing that information.

Reviewed-and-tested-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Tested-by: Ping Cheng <pingc@wacom.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
2012-09-19 19:50:20 +02:00
Henrik Rydberg
17a465a7f2 Input: MT - Get slot by key
Some devices use an internal key for tracking which cannot be directly
mapped to slots. This patch provides a key-to-slot mapping, which can
be used by drivers of such devices.

Reviewed-and-tested-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
2012-09-19 19:50:19 +02:00
Henrik Rydberg
7c1a87897c Input: MT - Add in-kernel tracking
With the INPUT_MT_TRACK flag set, the function input_mt_assign_slots()
can be used to match a new set of contacts against the currently used
slots. The algorithm used is based on Lagrange relaxation, and performs
very well in practice; slower than mtdev for a few corner cases, but
faster in most commonly occuring cases.

Tested-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
2012-09-19 19:50:19 +02:00
Henrik Rydberg
55e49089f4 Input: MT - Handle frame synchronization in core
Most MT drivers perform the same actions on frame synchronization.
Some actions, like dropping unseen contacts, are also unnecessarily
complex. Collect common frame synchronization tasks in a new function,
input_mt_sync_frame(). Depending on the flags set, it drops unseen
contacts and performs pointer emulation.

With init flags and frame synchronization in place, most MT drivers
can be simplified. First out are the bcm5974 and hid-multitouch
drivers, following this patch.

Reviewed-and-tested-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Tested-by: Ping Cheng <pingc@wacom.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
2012-09-19 19:50:19 +02:00
Henrik Rydberg
b4adbbefc2 Input: MT - Add flags to input_mt_init_slots()
Preparing to move more repeated code into the mt core, add a flags
argument to the input_mt_slots_init() function.

Reviewed-and-tested-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Tested-by: Ping Cheng <pingc@wacom.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
2012-09-19 19:50:18 +02:00
Henrik Rydberg
4369c64c79 Input: Send events one packet at a time
On heavy event loads, such as a multitouch driver, the irqsoff latency
can be as high as 250 us.  By accumulating a frame worth of data
before passing it on, the latency can be dramatically reduced.  As a
side effect, the special EV_SYN handling can be removed, since the
frame is now atomic.

This patch adds the events() handler callback and uses it if it
exists. The latency is improved by 50 us even without the callback.

Cc: Daniel Kurtz <djkurtz@chromium.org>
Tested-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Tested-by: Ping Cheng <pingc@wacom.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
2012-09-19 19:50:18 +02:00
Henrik Rydberg
8d18fba282 Input: Break out MT data
Move all MT-related things to a separate place. This saves some
bytes for non-mt input devices, and prepares for new MT features.

Reviewed-and-tested-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Tested-by: Ping Cheng <pingc@wacom.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
2012-09-19 19:50:17 +02:00
Dmitry Torokhov
e6c340171f Merge tag 'v3.6-rc4' into next
Linux 3.6-rc4

# gpg: Signature made Sat 01 Sep 2012 10:40:33 AM PDT using RSA key ID 00411886
# gpg: Good signature from "Linus Torvalds <torvalds@linux-foundation.org>"
2012-09-04 22:57:19 -07:00
Stephen Warren
a85442ade2 Input: tegra - move platform data header
Move the Tegra KBC platform data header out of arch/arm/mach-tegra, as
a pre-requisite of single zImage.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2012-09-04 22:44:06 -07:00
John Stultz
cee58483cf time: Move ktime_t overflow checking into timespec_valid_strict
Andreas Bombe reported that the added ktime_t overflow checking added to
timespec_valid in commit 4e8b14526c ("time: Improve sanity checking of
timekeeping inputs") was causing problems with X.org because it caused
timeouts larger then KTIME_T to be invalid.

Previously, these large timeouts would be clamped to KTIME_MAX and would
never expire, which is valid.

This patch splits the ktime_t overflow checking into a new
timespec_valid_strict function, and converts the timekeeping codes
internal checking to use this more strict function.

Reported-and-tested-by: Andreas Bombe <aeb@debian.org>
Cc: Zhouping Liu <zliu@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-01 10:24:48 -07:00
Linus Torvalds
9acb172543 Merge tag 'fixes-3.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull arm-soc fixes from Arnd Bergmann:
 "Bug fixes for various ARM platforms.  About half of these are for OMAP
  and submitted before but did not make it into v3.6-rc2."

* tag 'fixes-3.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (39 commits)
  ARM: ux500: don't select LEDS_GPIO for snowball
  ARM: imx: build i.MX6 functions only when needed
  ARM: imx: select CPU_FREQ_TABLE when needed
  ARM: imx: fix ksz9021rn_phy_fixup
  ARM: imx: build pm-imx5 code only when PM is enabled
  ARM: omap: allow building omap44xx without SMP
  ARM: dts: imx51-babbage: fix esdhc cd/wp properties
  ARM: imx6: spin the cpu until hardware takes it down
  ARM: ux500: Ensure probing of Audio devices when Device Tree is enabled
  ARM: ux500: Fix merge error, no matching driver name for 'snd_soc_u8500'
  ARM i.MX6q: Add virtual 1/3.5 dividers in the LDB clock path
  ARM: Kirkwood: fix Makefile.boot
  ARM: Kirkwood: Fix iconnect leds
  ARM: Orion: Set eth packet size csum offload limit
  ARM: mv78xx0: fix win_cfg_base prototype
  ARM: OMAP: dmtimers: Fix locking issue in omap_dm_timer_request*()
  ARM: mmp: fix potential NULL dereference
  ARM: OMAP4: Register the OPP table only for 4430 device
  cpufreq: OMAP: Handle missing frequency table on SMP systems
  ARM: OMAP4: sleep: Save the complete used register stack frame
  ...
2012-08-25 17:33:33 -07:00
Linus Torvalds
a7e546f175 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block-related fixes from Jens Axboe:

 - Improvements to the buffered and direct write IO plugging from
   Fengguang.

 - Abstract out the mapping of a bio in a request, and use that to
   provide a blk_bio_map_sg() helper.  Useful for mapping just a bio
   instead of a full request.

 - Regression fix from Hugh, fixing up a patch that went into the
   previous release cycle (and marked stable, too) attempting to prevent
   a loop in __getblk_slow().

 - Updates to discard requests, fixing up the sizing and how we align
   them.  Also a change to disallow merging of discard requests, since
   that doesn't really work properly yet.

 - A few drbd fixes.

 - Documentation updates.

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: replace __getblk_slow misfix by grow_dev_page fix
  drbd: Write all pages of the bitmap after an online resize
  drbd: Finish requests that completed while IO was frozen
  drbd: fix drbd wire compatibility for empty flushes
  Documentation: update tunable options in block/cfq-iosched.txt
  Documentation: update tunable options in block/cfq-iosched.txt
  Documentation: update missing index files in block/00-INDEX
  block: move down direct IO plugging
  block: remove plugging at buffered write time
  block: disable discard request merge temporarily
  bio: Fix potential memory leak in bio_find_or_create_slab()
  block: Don't use static to define "void *p" in show_partition_start()
  block: Add blk_bio_map_sg() helper
  block: Introduce __blk_segment_map_sg() helper
  fs/block-dev.c:fix performance regression in O_DIRECT writes to md block devices
  block: split discard into aligned requests
  block: reorganize rounding of max_discard_sectors
2012-08-25 11:36:43 -07:00
Linus Torvalds
b5bc0c7054 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "Mostly small fixes for the fallout of the timekeeping overhaul in 3.6
  along with stable fixes to address an accumulation problem and missing
  sanity checks for RTC readouts and user space provided values."

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Avoid making adjustments if we haven't accumulated anything
  time: Avoid potential shift overflow with large shift values
  time: Fix casting issue in timekeeping_forward_now
  time: Ensure we normalize the timekeeper in tk_xtime_add
  time: Improve sanity checking of timekeeping inputs
2012-08-23 21:46:57 -07:00
Arnd Bergmann
57f0b20141 Merge branch 'randconfig/mach' into fixes
Small platform specific bug fixes for problems found in randconfig builds.

* randconfig/mach:
  ARM: ux500: don't select LEDS_GPIO for snowball
  ARM: imx: build i.MX6 functions only when needed
  ARM: imx: select CPU_FREQ_TABLE when needed
  ARM: imx: fix ksz9021rn_phy_fixup
  ARM: imx: build pm-imx5 code only when PM is enabled
  ARM: omap: allow building omap44xx without SMP

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2012-08-23 17:30:54 +02:00
Arnd Bergmann
c7a9b09b1a ARM: omap: allow building omap44xx without SMP
The new omap4 cpuidle implementation currently requires
ARCH_NEEDS_CPU_IDLE_COUPLED, which only works on SMP.

This patch makes it possible to build a non-SMP kernel
for that platform. This is not normally desired for
end-users but can be useful for testing.

Without this patch, building rand-0y2jSKT results in:

drivers/cpuidle/coupled.c: In function 'cpuidle_coupled_poke':
drivers/cpuidle/coupled.c:317:3: error: implicit declaration of function '__smp_call_function_single' [-Werror=implicit-function-declaration]

It's not clear if this patch is the best solution for
the problem at hand. I have made sure that we can now
build the kernel in all configurations, but that does
not mean it will actually work on an OMAP44xx.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Tony Lindgren <tony@atomide.com>
2012-08-23 17:16:42 +02:00
Arnd Bergmann
1e72fe1fca Merge branch 'imx/fixes-for-3.6' of git://git.linaro.org/people/shawnguo/linux-2.6 into fixes
* 'imx/fixes-for-3.6' of git://git.linaro.org/people/shawnguo/linux-2.6:
  ARM: dts: imx51-babbage: fix esdhc cd/wp properties
  ARM: imx6: spin the cpu until hardware takes it down
  ARM i.MX6q: Add virtual 1/3.5 dividers in the LDB clock path

Also updates to Linux 3.6-rc2

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2012-08-23 17:02:42 +02:00
Linus Torvalds
ad746be969 Merge tag 'nfs-for-3.6-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
 - NFSv3 mounts need to fail if the FSINFO rpc call fails
 - Ensure that the NFS commit cache gets torn down when we unload the
   NFS module.
 - Fix memory scribble issues when interrupting a LAYOUTGET rpc call
 - Fix NFSv4 legacy idmapper regressions
 - Fix issues with the NFSv4 getacl command
 - Fix a regression when using the legacy "mount -t nfs4"

* tag 'nfs-for-3.6-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv3: Ensure that do_proc_get_root() reports errors correctly
  NFSv4: Ensure that nfs4_alloc_client cleans up on error.
  NFS: return -ENOKEY when the upcall fails to map the name
  NFS: Clear key construction data if the idmap upcall fails
  NFSv4: Don't use private xdr_stream fields in decode_getacl
  NFSv4: Fix the acl cache size calculation
  NFSv4: Fix pointer arithmetic in decode_getacl
  NFS: Alias the nfs module to nfs4
  NFS: Fix a regression when loading the NFS v4 module
  NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done
  pnfs-obj: Better IO pattern in case of unaligned offset
  NFS41: add pg_layout_private to nfs_pageio_descriptor
  pnfs: nfs4_proc_layoutget returns void
  pnfs: defer release of pages in layoutget
  nfs: tear down caches in nfs_init_writepagecache when allocation fails
2012-08-22 09:57:25 -07:00
Linus Torvalds
467e9e51d0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull assorted fixes - mostly vfs - from Al Viro:
 "Assorted fixes, with an unexpected detour into vfio refcounting logics
  (fell out when digging in an analog of eventpoll race in there)."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  task_work: add a scheduling point in task_work_run()
  fs: fix fs/namei.c kernel-doc warnings
  eventpoll: use-after-possible-free in epoll_create1()
  vfio: grab vfio_device reference *before* exposing the sucker via fd_install()
  vfio: get rid of vfio_device_put()/vfio_group_get_device* races
  vfio: get rid of open-coding kref_put_mutex
  introduce kref_put_mutex()
  vfio: don't dereference after kfree...
  mqueue: lift mnt_want_write() outside ->i_mutex, clean up a bit
2012-08-22 09:56:06 -07:00
Al Viro
8ad5db8a8d introduce kref_put_mutex()
equivalent of
	mutex_lock(mutex);
	if (!kref_put(kref, release))
		mutex_unlock(mutex);

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-08-22 10:24:41 -04:00
Dmitry Torokhov
c5b3533a82 Input: uinput - specify exact bit sizes on userspace APIs
Switch to using __u32/__s32 instead of ordinary 'int' in structures
forming userspace API.

Also internally make request_id unsigned int.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2012-08-21 22:29:54 -07:00
Linus Torvalds
23dcfa61ba Merge branch 'akpm' (Andrew's patch-bomb)
Merge fixes from Andrew Morton.

Random drivers and some VM fixes.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (17 commits)
  mm: compaction: Abort async compaction if locks are contended or taking too long
  mm: have order > 0 compaction start near a pageblock with free pages
  rapidio/tsi721: fix unused variable compiler warning
  rapidio/tsi721: fix inbound doorbell interrupt handling
  drivers/rtc/rtc-rs5c348.c: fix hour decoding in 12-hour mode
  mm: correct page->pfmemalloc to fix deactivate_slab regression
  drivers/rtc/rtc-pcf2123.c: initialize dynamic sysfs attributes
  mm/compaction.c: fix deferring compaction mistake
  drivers/misc/sgi-xp/xpc_uv.c: SGI XPC fails to load when cpu 0 is out of IRQ resources
  string: do not export memweight() to userspace
  hugetlb: update hugetlbpage.txt
  checkpatch: add control statement test to SINGLE_STATEMENT_DO_WHILE_MACRO
  mm: hugetlbfs: correctly populate shared pmd
  cciss: fix incorrect scsi status reporting
  Documentation: update mount option in filesystem/vfat.txt
  mm: change nr_ptes BUG_ON to WARN_ON
  cs5535-clockevt: typo, it's MFGPT, not MFPGT
2012-08-21 17:22:22 -07:00
Linus Torvalds
8f8ba75ee2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking update from David Miller:
 "A couple weeks of bug fixing in there.  The largest chunk is all the
  broken crap Amerigo Wang found in the netpoll layer."

 1) netpoll and it's users has several serious bugs:
    a) uses GFP_KERNEL with locks held
    b) interfaces requiring interrupts disabled are called with them
       enabled
    c) and vice versa
    d) VLAN tag demuxing, as per all other RX packet input paths, is not
       applied

    All from Amerigo Wang.

 2) Hopefully cure the ipv4 mapped ipv6 address TCP early demux bugs for
    good, from Neal Cardwell.

 3) Unlike AF_UNIX, AF_PACKET sockets don't set a default credentials
    when the user doesn't specify one explicitly during sendmsg().
    Instead we attach an empty (zero) SCM credential block which is
    definitely not what we want.  Fix from Eric Dumazet.

 4) IPv6 illegally invokes netdevice notifiers with RCU lock held, fix
    from Ben Hutchings.

 5) inet_csk_route_child_sock() checks wrong inet options pointer, fix
    from Christoph Paasch.

 6) When AF_PACKET is used for transmit, packet loopback doesn't behave
    properly when a socket fanout is enabled, from Eric Leblond.

 7) On bluetooth l2cap channel create failure, we leak the socket, from
    Jaganath Kanakkassery.

 8) Fix all the netprio file handling bugs found by Al Viro, from John
    Fastabend.

 9) Several error return and NULL deref bug fixes in networking drivers
    from Julia Lawall.

10) A large smattering of struct padding et al.  kernel memory leaks to
    userspace found of Mathias Krause.

11) Conntrack expections in netfilter can access an uninitialized timer,
    fix from Pablo Neira Ayuso.

12) Several netfilter SIP tracker bug fixes from Patrick McHardy.

13) IPSEC ipv6 routes are not initialized correctly all the time,
    resulting in an OOPS in inet_putpeer().  Also from Patrick McHardy.

14) Bridging does rcu_dereference() outside of RCU protected area, from
    Stephen Hemminger.

15) Fix routing cache removal performance regression when looking up
    output routes that have a local destination.  From Zheng Yan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
  af_netlink: force credentials passing [CVE-2012-3520]
  ipv4: fix ip header ident selection in __ip_make_skb()
  ipv4: Use newinet->inet_opt in inet_csk_route_child_sock()
  tcp: fix possible socket refcount problem
  net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child()
  net/core/dev.c: fix kernel-doc warning
  netconsole: remove a redundant netconsole_target_put()
  net: ipv6: fix oops in inet_putpeer()
  net/stmmac: fix issue of clk_get for Loongson1B.
  caif: Do not dereference NULL in chnl_recv_cb()
  af_packet: don't emit packet on orig fanout group
  drivers/net/irda: fix error return code
  drivers/net/wan/dscc4.c: fix error return code
  drivers/net/wimax/i2400m/fw.c: fix error return code
  smsc75xx: add missing entry to MAINTAINERS
  net: qmi_wwan: new devices: UML290 and K5006-Z
  net: sh_eth: Add eth support for R8A7779 device
  netdev/phy: skip disabled mdio-mux nodes
  dt: introduce for_each_available_child_of_node, of_get_next_available_child
  net: netprio: fix cgrp create and write priomap race
  ...
2012-08-21 16:46:08 -07:00
Mel Gorman
c67fe3752a mm: compaction: Abort async compaction if locks are contended or taking too long
Jim Schutt reported a problem that pointed at compaction contending
heavily on locks.  The workload is straight-forward and in his own words;

	The systems in question have 24 SAS drives spread across 3 HBAs,
	running 24 Ceph OSD instances, one per drive.  FWIW these servers
	are dual-socket Intel 5675 Xeons w/48 GB memory.  I've got ~160
	Ceph Linux clients doing dd simultaneously to a Ceph file system
	backed by 12 of these servers.

Early in the test everything looks fine

  procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
   r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
  31 15          0     287216        576   38606628    0    0     2  1158    2   14   1  3  95  0  0
  27 15          0     225288        576   38583384    0    0    18 2222016 203357 134876  11 56  17 15  0
  28 17          0     219256        576   38544736    0    0    11 2305932 203141 146296  11 49  23 17  0
   6 18          0     215596        576   38552872    0    0     7 2363207 215264 166502  12 45  22 20  0
  22 18          0     226984        576   38596404    0    0     3 2445741 223114 179527  12 43  23 22  0

and then it goes to pot

  procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
   r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
  163  8          0     464308        576   36791368    0    0    11 22210  866  536   3 13  79  4  0
  207 14          0     917752        576   36181928    0    0   712 1345376 134598 47367   7 90   1  2  0
  123 12          0     685516        576   36296148    0    0   429 1386615 158494 60077   8 84   5  3  0
  123 12          0     598572        576   36333728    0    0  1107 1233281 147542 62351   7 84   5  4  0
  622  7          0     660768        576   36118264    0    0   557 1345548 151394 59353   7 85   4  3  0
  223 11          0     283960        576   36463868    0    0    46 1107160 121846 33006   6 93   1  1  0

Note that system CPU usage is very high blocks being written out has
dropped by 42%. He analysed this with perf and found

  perf record -g -a sleep 10
  perf report --sort symbol --call-graph fractal,5
    34.63%  [k] _raw_spin_lock_irqsave
            |
            |--97.30%-- isolate_freepages
            |          compaction_alloc
            |          unmap_and_move
            |          migrate_pages
            |          compact_zone
            |          compact_zone_order
            |          try_to_compact_pages
            |          __alloc_pages_direct_compact
            |          __alloc_pages_slowpath
            |          __alloc_pages_nodemask
            |          alloc_pages_vma
            |          do_huge_pmd_anonymous_page
            |          handle_mm_fault
            |          do_page_fault
            |          page_fault
            |          |
            |          |--87.39%-- skb_copy_datagram_iovec
            |          |          tcp_recvmsg
            |          |          inet_recvmsg
            |          |          sock_recvmsg
            |          |          sys_recvfrom
            |          |          system_call
            |          |          __recv
            |          |          |
            |          |           --100.00%-- (nil)
            |          |
            |           --12.61%-- memcpy
             --2.70%-- [...]

There was other data but primarily it is all showing that compaction is
contended heavily on the zone->lock and zone->lru_lock.

commit [b2eef8c0: mm: compaction: minimise the time IRQs are disabled
while isolating pages for migration] noted that it was possible for
migration to hold the lru_lock for an excessive amount of time. Very
broadly speaking this patch expands the concept.

This patch introduces compact_checklock_irqsave() to check if a lock
is contended or the process needs to be scheduled. If either condition
is true then async compaction is aborted and the caller is informed.
The page allocator will fail a THP allocation if compaction failed due
to contention. This patch also introduces compact_trylock_irqsave()
which will acquire the lock only if it is not contended and the process
does not need to schedule.

Reported-by: Jim Schutt <jaschut@sandia.gov>
Tested-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-21 16:45:03 -07:00