Commit Graph

819 Commits

Author SHA1 Message Date
Peter Zijlstra
bf378d341e perf: Fix perf ring buffer memory ordering
The PPC64 people noticed a missing memory barrier and crufty old
comments in the perf ring buffer code. So update all the comments and
add the missing barrier.

When the architecture implements local_t using atomic_long_t there
will be double barriers issued; but short of introducing more
conditional barrier primitives this is the best we can do.

Reported-by: Victor Kaplansky <victork@il.ibm.com>
Tested-by: Victor Kaplansky <victork@il.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: michael@ellerman.id.au
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:01:19 +01:00
Linus Torvalds
0d645a8b82 Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband bugfix from Roland Dreier:
 "Disable not-quite-ready userspace ABI for IB flow steering"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/core: Temporarily disable create_flow/destroy_flow uverbs
2013-10-23 07:51:25 +01:00
Linus Torvalds
db10accfd2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Sorry I let so much accumulate, I was in Buffalo and wanted a few
  things to cook in my tree for a while before sending to you.  Anyways,
  it's a lot of little things as usual at this stage in the game"

 1) Make bonding MAINTAINERS entry reflect reality, from Andy
    Gospodarek.

 2) Fix accidental sock_put() on timewait mini sockets, from Eric
    Dumazet.

 3) Fix crashes in l2tp due to mis-handling of ipv4 mapped ipv6
    addresses, from François CACHEREUL.

 4) Fix heap overflow in __audit_sockaddr(), from the eagle eyed Dan
    Carpenter.

 5) tcp_shifted_skb() doesn't take handle FINs properly, from Eric
    Dumazet.

 6) SFC driver bug fixes from Ben Hutchings.

 7) Fix TX packet scheduling wedge after channel change in ath9k driver,
    from Felix Fietkau.

 8) Fix user after free in BPF JIT code, from Alexei Starovoitov.

 9) Source address selection test is reversed in
    __ip_route_output_key(), fix from Jiri Benc.

10) VLAN and CAN layer mis-size netlink attributes, from Marc
    Kleine-Budde.

11) Fix permission checks in sysctls to use current_euid() instead of
    current_uid().  From Eric W Biederman.

12) IPSEC policies can go away while a timer is still pending for them,
    add appropriate ref-counting to fix, from Steffen Klassert.

13) Fix mis-programming of FDR and RMCR registers on R8A7740 sh_eth
    chips, from Nguyen Hong Ky and Simon Horman.

14) MLX4 forgets to DMA unmap pages on RX, fix from Amir Vadai.

15) IPV6 GRE tunnel MTU upper limit is miscalculated, from Oussama
    Ghorbel.

16) Fix typo in fq_change(), we were assigning "initial quantum" to
    "quantum".  From Eric Dumazet.

17) Set a more appropriate sk_pacing_rate for non-TCP sockets, otherwise
    FQ packet scheduler does not pace those flows properly.  Also from
    Eric Dumazet.

18) rtlwifi miscalculates packet pointers, from Mark Cave-Ayland.

19) l2tp_xmit_skb() can be called from process context, not just softirq
    context, so we must always make sure to BH disable around it.  From
    Eric Dumazet.

20) On qdisc reset, we forget to purge the RB tree of SKBs in netem
    packet scheduler.  From Stephen Hemminger.

21) Fix info leak in farsync WAN driver ioctl() handler, from Dan
    Carpenter and Salva Peiró.

22) Fix PHY reset and other issues in dm9000 driver, from Nikita
    Kiryanov and Michael Abbott.

23) When hardware can do SCTP crc32 checksums, we accidently don't
    disable the csum offload when IPSEC transformations have been
    applied.  From Fan Du and Vlad Yasevich.

24) Tail loss probing in TCP leaves the socket in the wrong congestion
    avoidance state.  From Yuchung Cheng.

25) In CPSW driver, enable NAPI before interrupts are turned on, from
    Markus Pargmann.

26) Integer underflow and dual-assignment in YAM hamradio driver, from
    Dan Carpenter.

27) If we are going to mangle a packet in tcp_set_skb_tso_segs() we must
    unclone it.  This fixes various hard to track down crashes in
    drivers where the SKBs ->gso_segs was changing right from underneath
    the driver during TX queueing.  From Eric Dumazet.

28) Fix the handling of VLAN IDs, and in particular the special IDs 0
    and 4095, in the bridging layer.  From Toshiaki Makita.

29) Another info leak, this time in wanxl WAN driver, from Salva Peiró.

30) Fix race in socket credential passing, from Daniel Borkmann.

31) WHen NETLABEL is disabled, we don't validate CIPSO packets properly,
    from Seif Mazareeb.

32) Fix identification of fragmented frames in ipv4/ipv6 UDP
    Fragmentation Offload output paths, from Jiri Pirko.

33) Virtual Function fixes in bnx2x driver from Yuval Mintz and Ariel
    Elior.

34) When we removed the explicit neighbour pointer from ipv6 routes a
    slight regression was introduced for users such as IPVS, xt_TEE, and
    raw sockets.  We mix up the users requested destination address with
    the routes assigned nexthop/gateway.  From Julian Anastasov and
    Simon Horman.

35) Fix stack overruns in rt6_probe(), the issue is that can end up
    doing two full packet xmit paths at the same time when emitting
    neighbour discovery messages.  From Hannes Frederic Sowa.

36) davinci_emac driver doesn't handle IFF_ALLMULTI correctly, from
    Mariusz Ceier.

37) Make sure to set TCP sk_pacing_rate after the first legitimate RTT
    sample, from Neal Cardwell.

38) Wrong netlink attribute passed to xfrm_replay_verify_len(), from
    Steffen Klassert.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (152 commits)
  ax88179_178a: Add VID:DID for Samsung USB Ethernet Adapter
  ax88179_178a: Correct the RX error definition in RX header
  Revert "bridge: only expire the mdb entry when query is received"
  tcp: initialize passive-side sk_pacing_rate after 3WHS
  davinci_emac.c: Fix IFF_ALLMULTI setup
  mac802154: correct a typo in ieee802154_alloc_device() prototype
  ipv6: probe routes asynchronous in rt6_probe
  netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper
  ipv6: fill rt6i_gateway with nexthop address
  ipv6: always prefer rt6i_gateway if present
  bnx2x: Set NETIF_F_HIGHDMA unconditionally
  bnx2x: Don't pretend during register dump
  bnx2x: Lock DMAE when used by statistic flow
  bnx2x: Prevent null pointer dereference on error flow
  bnx2x: Fix config when SR-IOV and iSCSI are enabled
  bnx2x: Fix Coalescing configuration
  bnx2x: Unlock VF-PF channel on MAC/VLAN config error
  bnx2x: Prevent an illegal pointer dereference during panic
  bnx2x: Fix Maximum CoS estimation for VFs
  drivers: net: cpsw: fix kernel warn during iperf test with interrupt pacing
  ...
2013-10-23 07:47:42 +01:00
Yann Droneaud
7afbddfae9 IB/core: Temporarily disable create_flow/destroy_flow uverbs
The create_flow/destroy_flow uverbs and the associated extensions to
the user-kernel verbs ABI are under review and are too experimental to
freeze at this point.

So userspace is not exposed to experimental features and an uinstable
ABI, temporarily disable this for v3.12 (with a Kconfig option behind
staging to reenable it if desired).

The feature will be enabled after proper cleanup for v3.13.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1381351016.git.ydroneaud@opteya.com
Link: http://marc.info/?i=cover.1381177342.git.ydroneaud@opteya.com

[ Add a Kconfig option to reenable these verbs.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-21 09:44:17 -07:00
Chris Wilson
bc5bd37ce4 drm: Pad drm_mode_get_connector to 64-bit boundary
Pavel Roskin reported that DRM_IOCTL_MODE_GETCONNECTOR was overwritting
the 4 bytes beyond the end of its structure with a 32-bit userspace
running on a 64-bit kernel. This is due to the padding gcc inserts as
the drm_mode_get_connector struct includes a u64 and its size is not a
natural multiple of u64s.

64-bit kernel:

sizeof(drm_mode_get_connector)=80, alignof=8
sizeof(drm_mode_get_encoder)=20, alignof=4
sizeof(drm_mode_modeinfo)=68, alignof=4

32-bit userspace:

sizeof(drm_mode_get_connector)=76, alignof=4
sizeof(drm_mode_get_encoder)=20, alignof=4
sizeof(drm_mode_modeinfo)=68, alignof=4

Fortuituously we can insert explicit padding to the tail of our
structures without breaking ABI.

Reported-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-10-18 07:42:23 +01:00
stephen hemminger
5bc3db5c9c tc: export tc_defact.h to userspace
Jamal sent patch to add tc user simple actions to iproute2
but required header was not being exported.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-02 16:39:11 -04:00
Linus Torvalds
b97b869a83 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Nothing too major, radeon still has some dpm changes for off by
  default.

  Radeon, intel, msm:
   - radeon: a few more dpm fixes (still off by default), uvd fixes
   - i915: runtime warn backtrace and regression fix
   - msm: iommu changes fallout"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (27 commits)
  drm/msm: use drm_gem_dumb_destroy helper
  drm/msm: deal with mach/iommu.h removal
  drm/msm: Remove iommu include from mdp4_kms.c
  drm/msm: Odd PTR_ERR usage
  drm/i915: Fix up usage of SHRINK_STOP
  drm/radeon: fix hdmi audio on DCE3.0/3.1 asics
  drm/i915: preserve pipe A quirk in i9xx_set_pipeconf
  drm/i915/tv: clear adjusted_mode.flags
  drm/i915/dp: increase i2c-over-aux retry interval on AUX DEFER
  drm/radeon/cik: fix overflow in vram fetch
  drm/radeon: add missing hdmi callbacks for rv6xx
  drm/i915: Use a temporary va_list for two-pass string handling
  drm/radeon/uvd: lower msg&fb buffer requirements on UVD3
  drm/radeon: disable tests/benchmarks if accel is disabled
  drm/radeon: don't set default clocks for SI when DPM is disabled
  drm/radeon/dpm/ci: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm/si: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm/ni: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm/btc: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm: fetch the max clk from voltage dep tables helper
  ...
2013-09-29 10:02:40 -07:00
Dave Airlie
41ed7fe92f Merge branch 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
More radeon fixes for 3.12.  Kind of all over the place: UVD, DPM,
tiling, etc.

* 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon: fix hdmi audio on DCE3.0/3.1 asics
  drm/radeon/cik: fix overflow in vram fetch
  drm/radeon: add missing hdmi callbacks for rv6xx
  drm/radeon/uvd: lower msg&fb buffer requirements on UVD3
  drm/radeon: disable tests/benchmarks if accel is disabled
  drm/radeon: don't set default clocks for SI when DPM is disabled
  drm/radeon/dpm/ci: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm/si: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm/ni: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm/btc: filter clocks based on voltage/clk dep tables
  drm/radeon/dpm: fetch the max clk from voltage dep tables helper
  drm/radeon: fix missed variable sized access
  drm/radeon: Make r100_cp_ring_info() and radeon_ring_gfx() safe (v2)
  drm/radeon/cik: Add tiling mode index for 1D tiled depth/stencil surfaces
  drm/radeon/cik: Fix encoding of number of banks in tiling configuration info
  drm/radeon/cik: Fix printing of client name on VM protection fault
  drm/radeon: additional gcc fixes for radeon_atombios.c
  drm/radeon: avoid UVD corruption on AGP cards using GPU gart
2013-09-28 14:45:30 +10:00
Michel Dänzer
42baf21d91 drm/radeon/cik: Add tiling mode index for 1D tiled depth/stencil surfaces
CIK uses a different index for 1D DST surfaces compared to SI.  Expose
the new index so libdrm_radeon can use it properly for userspace
drivers.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-09-20 17:33:40 -04:00
Peter Zijlstra
fa73158710 perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'
Solve the problems around the broken definition of perf_event_mmap_page::
cap_usr_time and cap_usr_rdpmc fields which used to overlap, partially
fixed by:

  860f085b74 ("perf: Fix broken union in 'struct perf_event_mmap_page'")

The problem with the fix (merged in v3.12-rc1 and not yet released
officially), noticed by Vince Weaver is that the new behavior is
not detectable by new user-space, and that due to the reuse of the
field names it's easy to mis-compile a binary if old headers are used
on a new kernel or new headers are used on an old kernel.

To solve all that make this change explicit, detectable and self-contained,
by iterating the ABI the following way:

 - Always clear bit 0, and rename it to usrpage->cap_bit0, to at least not
   confuse old user-space binaries. RDPMC will be marked as unavailable
   to old binaries but that's within the ABI, this is a capability bit.

 - Rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new
   libraries can reliably detect that bit 0 is deprecated and perma-zero
   without having to check the kernel version.

 - Use bits 2, 3, 4 for the newly defined, correct functionality:

	cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
	cap_user_time		: 1, /* The time_* fields are used */
	cap_user_time_zero	: 1, /* The time_zero field is used */

 - Rename all the bitfield names in perf_event.h to be different from the
   old names, to make sure it's not possible to mis-compile it
   accidentally with old assumptions.

The 'size' field can then be used in the future to add new fields and it
will act as a natural ABI version indicator as well.

Also adjust tools/perf/ userspace for the new definitions, noticed by
Adrian Hunter.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Also-Fixed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/n/tip-zr03yxjrpXesOzzupszqglbv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 09:45:11 +02:00
Peter Zijlstra
c5ecceefdb perf: Update ABI comment
For some mysterious reason the sample_id field of PERF_RECORD_MMAP went AWOL.

Reported-by: Vince Weaver <vince@deater.net>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 06:54:34 +02:00
Linus Torvalds
186844b292 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two small fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix UAPI export of PERF_EVENT_IOC_ID
  perf/x86/intel: Fix Silvermont offcore masks
2013-09-18 11:22:53 -05:00
Vince Weaver
a8e0108cac perf: Fix UAPI export of PERF_EVENT_IOC_ID
Without the following patch I have problems compiling code using
the new PERF_EVENT_IOC_ID ioctl().  It looks like u64 was used
instead of __u64

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1309171450380.11444@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-18 11:29:07 +02:00
Linus Torvalds
8bf5e36d04 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input update from Dmitry Torokhov:
 "The only change is David Hermann's new EVIOCREVOKE evdev ioctl that
  allows safely passing file descriptors to input devices to session
  processes and later being able to stop delivery of events through
  these fds so that inactive sessions will no longer receive user input
  that does not belong to them"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: evdev - add EVIOCREVOKE ioctl
2013-09-15 07:13:39 -04:00
Linus Torvalds
26935fb06e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 4 from Al Viro:
 "list_lru pile, mostly"

This came out of Andrew's pile, Al ended up doing the merge work so that
Andrew didn't have to.

Additionally, a few fixes.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
  super: fix for destroy lrus
  list_lru: dynamically adjust node arrays
  shrinker: Kill old ->shrink API.
  shrinker: convert remaining shrinkers to count/scan API
  staging/lustre/libcfs: cleanup linux-mem.h
  staging/lustre/ptlrpc: convert to new shrinker API
  staging/lustre/obdclass: convert lu_object shrinker to count/scan API
  staging/lustre/ldlm: convert to shrinkers to count/scan API
  hugepage: convert huge zero page shrinker to new shrinker API
  i915: bail out earlier when shrinker cannot acquire mutex
  drivers: convert shrinkers to new count/scan API
  fs: convert fs shrinkers to new scan/count API
  xfs: fix dquot isolation hang
  xfs-convert-dquot-cache-lru-to-list_lru-fix
  xfs: convert dquot cache lru to list_lru
  xfs: rework buffer dispose list tracking
  xfs-convert-buftarg-lru-to-generic-code-fix
  xfs: convert buftarg LRU to generic code
  fs: convert inode and dentry shrinking to be node aware
  vmscan: per-node deferred work
  ...
2013-09-12 15:01:38 -07:00
Linus Torvalds
b7c09ad401 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "This is against 3.11-rc7, but was pulled and tested against your tree
  as of yesterday.  We do have two small incrementals queued up, but I
  wanted to get this bunch out the door before I hop on an airplane.

  This is a fairly large batch of fixes, performance improvements, and
  cleanups from the usual Btrfs suspects.

  We've included Stefan Behren's work to index subvolume UUIDs, which is
  targeted at speeding up send/receive with many subvolumes or snapshots
  in place.  It closes a long standing performance issue that was built
  in to the disk format.

  Mark Fasheh's offline dedup work is also here.  In this case offline
  means the FS is mounted and active, but the dedup work is not done
  inline during file IO.  This is a building block where utilities are
  able to ask the FS to dedup a series of extents.  The kernel takes
  care of verifying the data involved really is the same.  Today this
  involves reading both extents, but we'll continue to evolve the
  patches"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (118 commits)
  Btrfs: optimize key searches in btrfs_search_slot
  Btrfs: don't use an async starter for most of our workers
  Btrfs: only update disk_i_size as we remove extents
  Btrfs: fix deadlock in uuid scan kthread
  Btrfs: stop refusing the relocation of chunk 0
  Btrfs: fix memory leak of uuid_root in free_fs_info
  btrfs: reuse kbasename helper
  btrfs: return btrfs error code for dev excl ops err
  Btrfs: allow partial ordered extent completion
  Btrfs: convert all bug_ons in free-space-cache.c
  Btrfs: add support for asserts
  Btrfs: adjust the fs_devices->missing count on unmount
  Btrf: cleanup: don't check for root_refs == 0 twice
  Btrfs: fix for patch "cleanup: don't check the same thing twice"
  Btrfs: get rid of one BUG() in write_all_supers()
  Btrfs: allocate prelim_ref with a slab allocater
  Btrfs: pass gfp_t to __add_prelim_ref() to avoid always using GFP_ATOMIC
  Btrfs: fix race conditions in BTRFS_IOC_FS_INFO ioctl
  Btrfs: fix race between removing a dev and writing sbs
  Btrfs: remove ourselves from the cluster list under lock
  ...
2013-09-12 09:58:51 -07:00
Linus Torvalds
7b7a2f0a31 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "CIFS update including case insensitive file name matching improvements
  for UTF-8 to Unicode, various small cifs fixes, SMB2/SMB3 leasing
  improvements, support for following SMB2 symlinks, SMB3 packet signing
  improvements"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (25 commits)
  CIFS: Respect epoch value from create lease context v2
  CIFS: Add create lease v2 context for SMB3
  CIFS: Move parsing lease buffer to ops struct
  CIFS: Move creating lease buffer to ops struct
  CIFS: Store lease state itself rather than a mapped oplock value
  CIFS: Replace clientCanCache* bools with an integer
  [CIFS] quiet sparse compile warning
  cifs: Start using per session key for smb2/3 for signature generation
  cifs: Add a variable specific to NTLMSSP for key exchange.
  cifs: Process post session setup code in respective dialect functions.
  CIFS: convert to use le32_add_cpu()
  CIFS: Fix missing lease break
  CIFS: Fix a memory leak when a lease break comes
  cifs: add winucase_convert.pl to Documentation/ directory
  cifs: convert case-insensitive dentry ops to use new case conversion routines
  cifs: add new case-insensitive conversion routines that are based on wchar_t's
  [CIFS] Add Scott to list of cifs contributors
  cifs: Move and expand MAX_SERVER_SIZE definition
  cifs: Expand max share name length to 256
  cifs: Move string length definitions to uapi
  ...
2013-09-12 07:41:12 -07:00
Glauber Costa
3942c07ccf fs: bump inode and dentry counters to long
This series reworks our current object cache shrinking infrastructure in
two main ways:

 * Noticing that a lot of users copy and paste their own version of LRU
   lists for objects, we put some effort in providing a generic version.
   It is modeled after the filesystem users: dentries, inodes, and xfs
   (for various tasks), but we expect that other users could benefit in
   the near future with little or no modification.  Let us know if you
   have any issues.

 * The underlying list_lru being proposed automatically and
   transparently keeps the elements in per-node lists, and is able to
   manipulate the node lists individually.  Given this infrastructure, we
   are able to modify the up-to-now hammer called shrink_slab to proceed
   with node-reclaim instead of always searching memory from all over like
   it has been doing.

Per-node lru lists are also expected to lead to less contention in the lru
locks on multi-node scans, since we are now no longer fighting for a
global lock.  The locks usually disappear from the profilers with this
change.

Although we have no official benchmarks for this version - be our guest to
independently evaluate this - earlier versions of this series were
performance tested (details at
http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no
visible performance regressions while yielding a better qualitative
behavior in NUMA machines.

With this infrastructure in place, we can use the list_lru entry point to
provide memcg isolation and per-memcg targeted reclaim.  Historically,
those two pieces of work have been posted together.  This version presents
only the infrastructure work, deferring the memcg work for a later time,
so we can focus on getting this part tested.  You can see more about the
history of such work at http://lwn.net/Articles/552769/

Dave Chinner (18):
  dcache: convert dentry_stat.nr_unused to per-cpu counters
  dentry: move to per-sb LRU locks
  dcache: remove dentries from LRU before putting on dispose list
  mm: new shrinker API
  shrinker: convert superblock shrinkers to new API
  list: add a new LRU list type
  inode: convert inode lru list to generic lru list code.
  dcache: convert to use new lru list infrastructure
  list_lru: per-node list infrastructure
  shrinker: add node awareness
  fs: convert inode and dentry shrinking to be node aware
  xfs: convert buftarg LRU to generic code
  xfs: rework buffer dispose list tracking
  xfs: convert dquot cache lru to list_lru
  fs: convert fs shrinkers to new scan/count API
  drivers: convert shrinkers to new count/scan API
  shrinker: convert remaining shrinkers to count/scan API
  shrinker: Kill old ->shrink API.

Glauber Costa (7):
  fs: bump inode and dentry counters to long
  super: fix calculation of shrinkable objects for small numbers
  list_lru: per-node API
  vmscan: per-node deferred work
  i915: bail out earlier when shrinker cannot acquire mutex
  hugepage: convert huge zero page shrinker to new shrinker API
  list_lru: dynamically adjust node arrays

This patch:

There are situations in very large machines in which we can have a large
quantity of dirty inodes, unused dentries, etc.  This is particularly true
when umounting a filesystem, where eventually since every live object will
eventually be discarded.

Dave Chinner reported a problem with this while experimenting with the
shrinker revamp patchset.  So we believe it is time for a change.  This
patch just moves int to longs.  Machines where it matters should have a
big long anyway.

Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:29 -04:00
Linus Torvalds
7426d62871 Merge tag 'dm-3.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device-mapper updates from Mike Snitzer:
 "Add the ability to collect I/O statistics on user-defined regions of a
  device-mapper device.  This dm-stats code required the reintroduction
  of a div64_u64_rem() helper, but as a separate method that doesn't
  slow down div64_u64() -- especially on 32-bit systems.

  Allow the error target to replace request-based DM devices (e.g.
  multipath) in addition to bio-based DM devices.

  Various other small code fixes and improvements to thin-provisioning,
  DM cache and the DM ioctl interface"

* tag 'dm-3.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm stripe: silence a couple sparse warnings
  dm: add statistics support
  dm thin: always return -ENOSPC if no_free_space is set
  dm ioctl: cleanup error handling in table_load
  dm ioctl: increase granularity of type_lock when loading table
  dm ioctl: prevent rename to empty name or uuid
  dm thin: set pool read-only if breaking_sharing fails block allocation
  dm thin: prefix pool error messages with pool device name
  dm: allow error target to replace bio-based and request-based targets
  math64: New separate div64_u64_rem helper
  dm space map: optimise sm_ll_dec and sm_ll_inc
  dm btree: prefetch child nodes when walking tree for a dm_btree_del
  dm btree: use pop_frame in dm_btree_del to cleanup code
  dm cache: eliminate holes in cache structure
  dm cache: fix stacking of geometry limits
  dm thin: fix stacking of geometry limits
  dm thin: add data block size limits to Documentation
  dm cache: add data block size limits to code and Documentation
  dm cache: document metadata device is exclussive to a cache
  dm: stop using WQ_NON_REENTRANT
2013-09-10 13:06:15 -07:00
Linus Torvalds
300893b08f Merge tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs
Pull xfs updates from Ben Myers:
 "For 3.12-rc1 there are a number of bugfixes in addition to work to
  ease usage of shared code between libxfs and the kernel, the rest of
  the work to enable project and group quotas to be used simultaneously,
  performance optimisations in the log and the CIL, directory entry file
  type support, fixes for log space reservations, some spelling/grammar
  cleanups, and the addition of user namespace support.

   - introduce readahead to log recovery
   - add directory entry file type support
   - fix a number of spelling errors in comments
   - introduce new Q_XGETQSTATV quotactl for project quotas
   - add USER_NS support
   - log space reservation rework
   - CIL optimisations
  - kernel/userspace libxfs rework"

* tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs: (112 commits)
  xfs: XFS_MOUNT_QUOTA_ALL needed by userspace
  xfs: dtype changed xfs_dir2_sfe_put_ino to xfs_dir3_sfe_put_ino
  Fix wrong flag ASSERT in xfs_attr_shortform_getvalue
  xfs: finish removing IOP_* macros.
  xfs: inode log reservations are too small
  xfs: check correct status variable for xfs_inobt_get_rec() call
  xfs: inode buffers may not be valid during recovery readahead
  xfs: check LSN ordering for v5 superblocks during recovery
  xfs: btree block LSN escaping to disk uninitialised
  XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568
  xfs: fix bad dquot buffer size in log recovery readahead
  xfs: don't account buffer cancellation during log recovery readahead
  xfs: check for underflow in xfs_iformat_fork()
  xfs: xfs_dir3_sfe_put_ino can be static
  xfs: introduce object readahead to log recovery
  xfs: Simplify xfs_ail_min() with list_first_entry_or_null()
  xfs: Register hotcpu notifier after initialization
  xfs: add xfs sb v4 support for dirent filetype field
  xfs: Add write support for dirent filetype field
  xfs: Add read-only support for dirent filetype field
  ...
2013-09-09 11:19:09 -07:00
Linus Torvalds
d75671e36e Merge tag 'vfio-v3.12-rc0' of git://github.com/awilliam/linux-vfio
Pull VFIO update from Alex Williamson:
 "VFIO updates include safer default file flags for VFIO device fds, an
  external user interface exported to allow other modules to hold
  references to VFIO groups, a fix to test for extended config space on
  PCIe and PCI-x, and new hot reset interfaces for PCI devices which
  allows the user to do PCI bus/slot resets when all of the devices
  affected by the reset are owned by the user.

  For this last feature, the PCI bus reset interface, I depend on
  changes already merged from Bjorn's PCI pull request.  I therefore
  merged my tree up to commit cb3e433, which I think was the correct
  action, but as Stephen Rothwell noted, I failed to provide a commit
  message indicating why the merge was required.  Sorry for that.
  Thanks, Alex"

* tag 'vfio-v3.12-rc0' of git://github.com/awilliam/linux-vfio:
  vfio: fix documentation
  vfio-pci: PCI hot reset interface
  vfio-pci: Test for extended config space
  vfio-pci: Use fdget() rather than eventfd_fget()
  vfio: Add O_CLOEXEC flag to vfio device fd
  vfio: use get_unused_fd_flags(0) instead of get_unused_fd()
  vfio: add external user support
2013-09-09 10:19:36 -07:00
Scott Lovenberg
cdf1246ffb cifs: Move and expand MAX_SERVER_SIZE definition
MAX_SERVER_SIZE has been moved to cifs_mount.h and renamed
CIFS_NI_MAXHOST for clarity.  It has been expanded to 1024 as the
previous value of 16 was very short.

Signed-off-by: Scott Lovenberg <scott.lovenberg@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-09-08 14:34:22 -05:00
Scott Lovenberg
54fcf270de cifs: Expand max share name length to 256
The old max share name length limit was 80 due to Windows NET SHARE
command not allowing more than that.  However, share names can be much
longer.  This is a more reasonable maximum share name length.

Signed-off-by: Scott Lovenberg <scott.lovenberg@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-09-08 14:34:17 -05:00
Scott Lovenberg
8c3a2b4c42 cifs: Move string length definitions to uapi
The max string length definitions for user name, domain name, password,
and share name have been moved into their own header file in uapi so the
mount helper can use autoconf to define them instead of keeping the
kernel side and userland side definitions in sync manually.  The names
have also been standardized with a "CIFS" prefix and "LEN" suffix.

Signed-off-by: Scott Lovenberg <scott.lovenberg@gmail.com>
Reviewed-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-09-08 14:34:11 -05:00
Linus Torvalds
b409624ad5 Merge git://git.infradead.org/users/willy/linux-nvme
Pull NVM Express driver update from Matthew Wilcox.

* git://git.infradead.org/users/willy/linux-nvme:
  NVMe: Merge issue on character device bring-up
  NVMe: Handle ioremap failure
  NVMe: Add pci suspend/resume driver callbacks
  NVMe: Use normal shutdown
  NVMe: Separate controller init from disk discovery
  NVMe: Separate queue alloc/free from create/delete
  NVMe: Group pci related actions in functions
  NVMe: Disk stats for read/write commands only
  NVMe: Bring up cdev on set feature failure
  NVMe: Fix checkpatch issues
  NVMe: Namespace IDs are unsigned
  NVMe: Update nvme_id_power_state with latest spec
  NVMe: Split header file into user-visible and kernel-visible pieces
  NVMe: Call nvme_process_cq from submission path
  NVMe: Remove "process_cq did something" message
  NVMe: Return correct value from interrupt handler
  NVMe: Disk IO statistics
  NVMe: Restructure MSI / MSI-X setup
  NVMe: Use kzalloc instead of kmalloc+memset
2013-09-07 20:19:02 -07:00