Commit Graph

2504 Commits

Author SHA1 Message Date
Linus Torvalds
c31eeaced2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking changes from David Miller:

 1) Multiply in netfilter IPVS can overflow when calculating destination
    weight.  From Simon Kirby.

 2) Use after free fixes in IPVS from Julian Anastasov.

 3) SFC driver bug fixes from Daniel Pieczko.

 4) Memory leak in pcan_usb_core failure paths, from Alexey Khoroshilov.

 5) Locking and encapsulation fixes to serial line CAN driver, from
    Andrew Naujoks.

 6) Duplex and VF handling fixes to bnx2x driver from Yaniv Rosner,
    Eilon Greenstein, and Ariel Elior.

 7) In lapb, if no other packets are outstanding, T1 timeouts actually
    stall things and no packet gets sent.  Fix from Josselin Costanzi.

 8) ICMP redirects should not make it to the socket error queues, from
    Duan Jiong.

 9) Fix bugs in skge DMA mapping error handling, from Nikulas Patocka.

10) Fix setting of VLAN priority field on via-rhine driver, from Roget
    Luethi.

11) Fix TX stalls and VLAN promisc programming in be2net driver from
    Ajit Khaparde.

12) Packet padding doesn't get handled correctly in new usbnet SG
    support code, from Ming Lei.

13) Fix races in netdevice teardown wrt.  network namespace closing.
    From Eric W.  Biederman.

14) Fix potential missed initialization of net_secret if not TCP
    connections are openned.  From Eric Dumazet.

15) Cinterion PLXX product ID in qmi_wwan driver is wrong, from
    Aleksander Morgado.

16) skb_cow_head() can change skb->data and thus packet header pointers,
    don't use stale ip_hdr reference in ip_tunnel code.

17) Backend state transition handling fixes in xen-netback, from Paul
    Durrant.

18) Packet offset for AH protocol is handled wrong in flow dissector,
    from Eric Dumazet.

19) Taking down an fq packet scheduler instance can leave stale packets
    in the queues, fix from Eric Dumazet.

20) Fix performance regressions introduced by TCP Small Queues.  From
    Eric Dumazet.

21) IPV6 GRE tunneling code calculates max_headroom incorrectly, from
    Hannes Frederic Sowa.

22) Multicast timer handlers in ipv4 and ipv6 can be the last and final
    reference to the ipv4/ipv6 specific network device state, so use the
    reference put that will check and release the object if the
    reference hits zero.  From Salam Noureddine.

23) Fix memory corruption in ip_tunnel driver, and use skb_push()
    instead of __skb_push() so that similar bugs are less hard to find.
    From Steffen Klassert.

24) Add forgotten hookup of rtnl_ops in SIT and ip6tnl drivers, from
    Nicolas Dichtel.

25) fq scheduler doesn't accurately rate limit in certain circumstances,
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
  pkt_sched: fq: rate limiting improvements
  ip6tnl: allow to use rtnl ops on fb tunnel
  sit: allow to use rtnl ops on fb tunnel
  ip_tunnel: Remove double unregister of the fallback device
  ip_tunnel_core: Change __skb_push back to skb_push
  ip_tunnel: Add fallback tunnels to the hash lists
  ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
  qlcnic: Fix SR-IOV configuration
  ll_temac: Reset dma descriptors indexes on ndo_open
  skbuff: size of hole is wrong in a comment
  ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put
  ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put
  ethernet: moxa: fix incorrect placement of __initdata tag
  ipv6: gre: correct calculation of max_headroom
  powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
  Revert "powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file"
  bonding: Fix broken promiscuity reference counting issue
  tcp: TSQ can use a dynamic limit
  dm9601: fix IFF_ALLMULTI handling
  pkt_sched: fq: qdisc dismantle fixes
  ...
2013-10-01 12:58:48 -07:00
Heiko Carstens
491f6f8e5f lockref: use arch_mutex_cpu_relax() in CMPXCHG_LOOP()
Make use of arch_mutex_cpu_relax() so architectures can override the
default cpu_relax() semantics.
This is especially useful for s390, where cpu_relax() means that we
yield() the current (virtual) cpu and therefore is very expensive,
and would contradict the whole purpose of the lockless cmpxchg loop.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-28 12:46:24 +02:00
Eric W. Biederman
730d7d3398 sysfs: Allow mounting without CONFIG_NET
In kobj_ns_current_may_mount the default should be to allow the mount.
The test is only for a single kobj_ns_type at a time, and unless there
is a reason to prevent it the mounting sysfs should be allowed.
Subsystems that are not registered can't have are not involved so can't
have a reason to prevent mounting sysfs.

This is a bug-fix to commit 7dc5dbc879 ("sysfs: Restrict mounting
sysfs") that came in via the userns tree during the 3.12 merge window.

Reported-and-tested-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-27 09:18:39 -07:00
Will Deacon
d2212b4dce lockref: allow relaxed cmpxchg64 variant for lockless updates
The 64-bit cmpxchg operation on the lockref is ordered by virtue of
hazarding between the cmpxchg operation and the reference count
manipulation. On weakly ordered memory architectures (such as ARM), it
can be of great benefit to omit the barrier instructions where they are
not needed.

This patch moves the lockless lockref code over to a cmpxchg64_relaxed
operation, which doesn't provide barrier semantics. If the operation
isn't defined, we simply #define it as the usual 64-bit cmpxchg macro.

Cc: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-27 09:15:01 -07:00
Andre Naujoks
c26d436cbf lib: introduce upper case hex ascii helpers
To be able to use the hex ascii functions in case sensitive environments
the array hex_asc_upper[] and the needed functions for hex_byte_pack_upper()
are introduced.

Signed-off-by: Andre Naujoks <nautsch2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-20 15:38:26 -04:00
Will Deacon
8f4c344696 lockref: use cmpxchg64 explicitly for lockless updates
The cmpxchg() function tends not to support 64-bit arguments on 32-bit
architectures.  This could be either due to use of unsigned long
arguments (like on ARM) or lack of instruction support (cmpxchgq on
x86).  However, these architectures may implement a specific cmpxchg64()
function to provide 64-bit cmpxchg support instead.

Since the lockref code requires a 64-bit cmpxchg and relies on the
architecture selecting ARCH_USE_CMPXCHG_LOCKREF, move to using cmpxchg64
instead of cmpxchg and allow 32-bit architectures to make use of the
lockless lockref implementation.

Cc: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-20 11:04:28 -05:00
Linus Torvalds
399a946edb Merge branch 'genirq' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull generic hardirq option removal from Martin Schwidefsky:
 "All architectures now use generic hardirqs, s390 has been last to
  switch.

  With that the code under !CONFIG_GENERIC_HARDIRQS and the related
  HAVE_GENERIC_HARDIRQS and GENERIC_HARDIRQS config options can be
  removed.  Yay!"

* 'genirq' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  Remove GENERIC_HARDIRQ config option
2013-09-13 07:31:38 -07:00
Linus Torvalds
0898d2aa9d Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes a 7+ year race condition in the crypto API that causes
  sporadic crashes when multiple threads load the same algorithm.

  It also fixes the crct10dif algorithm again to prevent boot failures
  on systems where the initramfs tool ignores module softdeps"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: crct10dif - Add fallback for broken initrds
  crypto: api - Fix race condition in larval lookup
2013-09-13 07:11:14 -07:00
Martin Schwidefsky
0244ad004a Remove GENERIC_HARDIRQ config option
After the last architecture switched to generic hard irqs the config
options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
for !CONFIG_GENERIC_HARDIRQS can be removed.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-09-13 15:09:52 +02:00
Linus Torvalds
48efe453e6 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Lots of activity again this round for I/O performance optimizations
  (per-cpu IDA pre-allocation for vhost + iscsi/target), and the
  addition of new fabric independent features to target-core
  (COMPARE_AND_WRITE + EXTENDED_COPY).

  The main highlights include:

   - Support for iscsi-target login multiplexing across individual
     network portals
   - Generic Per-cpu IDA logic (kent + akpm + clameter)
   - Conversion of vhost to use per-cpu IDA pre-allocation for
     descriptors, SGLs and userspace page pointer list
   - Conversion of iscsi-target + iser-target to use per-cpu IDA
     pre-allocation for descriptors
   - Add support for generic COMPARE_AND_WRITE (AtomicTestandSet)
     emulation for virtual backend drivers
   - Add support for generic EXTENDED_COPY (CopyOffload) emulation for
     virtual backend drivers.
   - Add support for fast memory registration mode to iser-target (Vu)

  The patches to add COMPARE_AND_WRITE and EXTENDED_COPY support are of
  particular significance, which make us the first and only open source
  target to support the full set of VAAI primitives.

  Currently Linux clients are lacking upstream support to actually
  utilize these primitives.  However, with server side support now in
  place for folks like MKP + ZAB working on the client, this logic once
  reserved for the highest end of storage arrays, can now be run in VMs
  on their laptops"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits)
  target/iscsi: Bump versions to v4.1.0
  target: Update copyright ownership/year information to 2013
  iscsi-target: Bump default TCP listen backlog to 256
  target: Fix >= v3.9+ regression in PR APTPL + ALUA metadata write-out
  iscsi-target; Bump default CmdSN Depth to 64
  iscsi-target: Remove unnecessary wait_for_completion in iscsi_get_thread_set
  iscsi-target: Add thread_set->ts_activate_sem + use common deallocate
  iscsi-target: Fix race with thread_pre_handler flush_signals + ISCSI_THREAD_SET_DIE
  target: remove unused including <linux/version.h>
  iser-target: introduce fast memory registration mode (FRWR)
  iser-target: generalize rdma memory registration and cleanup
  iser-target: move rdma wr processing to a shared function
  target: Enable global EXTENDED_COPY setup/release
  target: Add Third Party Copy (3PC) bit in INQUIRY response
  target: Enable EXTENDED_COPY setup in spc_parse_cdb
  target: Add support for EXTENDED_COPY copy offload emulation
  target: Avoid non-existent tg_pt_gp_mem in target_alua_state_check
  target: Add global device list for EXTENDED_COPY
  target: Make helpers non static for EXTENDED_COPY command setup
  target: Make spc_parse_naa_6h_vendor_specific non static
  ...
2013-09-12 16:11:45 -07:00
Herbert Xu
26052f9b9b crypto: crct10dif - Add fallback for broken initrds
Unfortunately, even with a softdep some distros fail to include
the necessary modules in the initrd.  Therefore this patch adds
a fallback path to restore existing behaviour where we cannot
load the new crypto crct10dif algorithm.

In order to do this, the underlying crct10dif has been split out
from the crypto implementation so that it can be used on the
fallback path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-09-12 15:31:34 +10:00
Sergey Senozhatsky
b34081f1cd lz4: fix compression/decompression signedness mismatch
LZ4 compression and decompression functions require different in
signedness input/output parameters: unsigned char for compression and
signed char for decompression.

Change decompression API to require "(const) unsigned char *".

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yann Collet <yann.collet.73@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:45 -07:00
Jan Kara
5e4c0d9741 lib/radix-tree.c: make radix_tree_node_alloc() work correctly within interrupt
With users of radix_tree_preload() run from interrupt (block/blk-ioc.c is
one such possible user), the following race can happen:

radix_tree_preload()
...
radix_tree_insert()
  radix_tree_node_alloc()
    if (rtp->nr) {
      ret = rtp->nodes[rtp->nr - 1];
<interrupt>
...
radix_tree_preload()
...
radix_tree_insert()
  radix_tree_node_alloc()
    if (rtp->nr) {
      ret = rtp->nodes[rtp->nr - 1];

And we give out one radix tree node twice.  That clearly results in radix
tree corruption with different results (usually OOPS) depending on which
two users of radix tree race.

We fix the problem by making radix_tree_node_alloc() always allocate fresh
radix tree nodes when in interrupt.  Using preloading when in interrupt
doesn't make sense since all the allocations have to be atomic anyway and
we cannot steal nodes from process-context users because some users rely
on radix_tree_insert() succeeding after radix_tree_preload().
in_interrupt() check is somewhat ugly but we cannot simply key off passed
gfp_mask as that is acquired from root_gfp_mask() and thus the same for
all preload users.

Another part of the fix is to avoid node preallocation in
radix_tree_preload() when passed gfp_mask doesn't allow waiting.  Again,
preallocation in such case doesn't make sense and when preallocation would
happen in interrupt we could possibly leak some allocated nodes.  However,
some users of radix_tree_preload() require following radix_tree_insert()
to succeed.  To avoid unexpected effects for these users,
radix_tree_preload() only warns if passed gfp mask doesn't allow waiting
and we provide a new function radix_tree_maybe_preload() for those users
which get different gfp mask from different call sites and which are
prepared to handle radix_tree_insert() failure.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:36 -07:00
Cody P Schafer
7c993e11aa rbtree: allow tests to run as builtin
No reason require rbtree test code to be a module, allow it to be builtin
(streamlines my development process)

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Reviewed-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:20 -07:00
Cody P Schafer
a791a62fdf rbtree_test: add test for postorder iteration
Just check that we examine all nodes in the tree for the postorder
iteration.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Reviewed-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:20 -07:00
Cody P Schafer
9dee5c5151 rbtree: add postorder iteration functions
Postorder iteration yields all of a node's children prior to yielding the
node itself, and this particular implementation also avoids examining the
leaf links in a node after that node has been yielded.

In what I expect will be its most common usage, postorder iteration allows
the deletion of every node in an rbtree without modifying the rbtree nodes
(no _requirement_ that they be nulled) while avoiding referencing child
nodes after they have been "deleted" (most commonly, freed).

I have only updated zswap to use this functionality at this point, but
numerous bits of code (most notably in the filesystem drivers) use a hand
rolled postorder iteration that NULLs child links as it traverses the
tree.  Each of those instances could be replaced with this common
implementation.

1 & 2 add rbtree postorder iteration functions.
3 adds testing of the iteration to the rbtree runtime tests
4 allows building the rbtree runtime tests as builtins
5 updates zswap.

This patch:

Add postorder iteration functions for rbtree.  These are useful for safely
freeing an entire rbtree without modifying the tree at all.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Reviewed-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:19 -07:00
Alexandre Courbot
1431574a1c lib/decompressors: fix "no limit" output buffer length
When decompressing into memory, the output buffer length is set to some
arbitrarily high value (0x7fffffff) to indicate the output is, virtually,
unlimited in size.

The problem with this is that some platforms have their physical memory at
high physical addresses (0x80000000 or more), and that the output buffer
address and its "unlimited" length cannot be added without overflowing.
An example of this can be found in inflate_fast():

/* next_out is the output buffer address */
out = strm->next_out - OFF;
/* avail_out is the output buffer size. end will overflow if the output
 * address is >= 0x80000104 */
end = out + (strm->avail_out - 257);

This has huge consequences on the performance of kernel decompression,
since the following exit condition of inflate_fast() will be always true:

} while (in < last && out < end);

Indeed, "end" has overflowed and is now always lower than "out".  As a
result, inflate_fast() will return after processing one single byte of
input data, and will thus need to be called an unreasonably high number of
times.  This probably went unnoticed because kernel decompression is fast
enough even with this issue.

Nonetheless, adjusting the output buffer length in such a way that the
above pointer arithmetic never overflows results in a kernel decompression
that is about 3 times faster on affected machines.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: Jon Medhurst <tixy@linaro.org>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:58:38 -07:00
Gu Zheng
f2e1d2ac34 lib/crc32: update the comments of crc32_{be,le}_generic()
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:58:38 -07:00
Emilio López
5ab58acc40 lib/genalloc.c: correct dev_get_gen_pool documentation
The documentation mentions a "name" parameter, which does not exist.  This
commit removes such mention from the function documentation.

Signed-off-by: Emilio López <emilio@elopez.com.ar>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:58:38 -07:00
Joe Perches
ade34a3572 lib/genalloc.c: convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)
Use the helper function instead of __GFP_ZERO.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:58:13 -07:00
Joonyoung Shim
674470d979 lib/genalloc.c: fix overflow of ending address of memory chunk
In struct gen_pool_chunk, end_addr means the end address of memory chunk
(inclusive), but in the implementation it is treated as address + size of
memory chunk (exclusive), so it points to the address plus one instead of
correct ending address.

The ending address of memory chunk plus one will cause overflow on the
memory chunk including the last address of memory map, e.g.  when starting
address is 0xFFF00000 and size is 0x100000 on 32bit machine, ending
address will be 0x100000000.

Use correct ending address like starting address + size - 1.

[akpm@linux-foundation.org: add comment to struct gen_pool_chunk:end_addr]
Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:35 -07:00
Linus Torvalds
7426d62871 Merge tag 'dm-3.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device-mapper updates from Mike Snitzer:
 "Add the ability to collect I/O statistics on user-defined regions of a
  device-mapper device.  This dm-stats code required the reintroduction
  of a div64_u64_rem() helper, but as a separate method that doesn't
  slow down div64_u64() -- especially on 32-bit systems.

  Allow the error target to replace request-based DM devices (e.g.
  multipath) in addition to bio-based DM devices.

  Various other small code fixes and improvements to thin-provisioning,
  DM cache and the DM ioctl interface"

* tag 'dm-3.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm stripe: silence a couple sparse warnings
  dm: add statistics support
  dm thin: always return -ENOSPC if no_free_space is set
  dm ioctl: cleanup error handling in table_load
  dm ioctl: increase granularity of type_lock when loading table
  dm ioctl: prevent rename to empty name or uuid
  dm thin: set pool read-only if breaking_sharing fails block allocation
  dm thin: prefix pool error messages with pool device name
  dm: allow error target to replace bio-based and request-based targets
  math64: New separate div64_u64_rem helper
  dm space map: optimise sm_ll_dec and sm_ll_inc
  dm btree: prefetch child nodes when walking tree for a dm_btree_del
  dm btree: use pop_frame in dm_btree_del to cleanup code
  dm cache: eliminate holes in cache structure
  dm cache: fix stacking of geometry limits
  dm thin: fix stacking of geometry limits
  dm thin: add data block size limits to Documentation
  dm cache: add data block size limits to code and Documentation
  dm cache: document metadata device is exclussive to a cache
  dm: stop using WQ_NON_REENTRANT
2013-09-10 13:06:15 -07:00
Linus Torvalds
4d7696f1b0 Merge tag 'md/3.12' of git://neil.brown.name/md
Pull md update from Neil Brown:
 "Headline item is multithreading for RAID5 so that more IO/sec can be
  supported on fast (SSD) devices.  Also TILE-Gx SIMD suppor for RAID6
  calculations and an assortment of bug fixes"

* tag 'md/3.12' of git://neil.brown.name/md:
  raid5: only wakeup necessary threads
  md/raid5: flush out all pending requests before proceeding with reshape.
  md/raid5: use seqcount to protect access to shape in make_request.
  raid5: sysfs entry to control worker thread number
  raid5: offload stripe handle to workqueue
  raid5: fix stripe release order
  raid5: make release_stripe lockless
  md: avoid deadlock when dirty buffers during md_stop.
  md: Don't test all of mddev->flags at once.
  md: Fix apparent cut-and-paste error in super_90_validate
  raid6/test: replace echo -e with printf
  RAID: add tilegx SIMD implementation of raid6
  md: fix safe_mode buglet.
  md: don't call md_allow_write in get_bitmap_file.
2013-09-10 13:03:41 -07:00
Kent Overstreet
798ab48eec idr: Percpu ida
Percpu frontend for allocating ids. With percpu allocation (that works),
it's impossible to guarantee it will always be possible to allocate all
nr_tags - typically, some will be stuck on a remote percpu freelist
where the current job can't get to them.

We do guarantee that it will always be possible to allocate at least
(nr_tags / 2) tags - this is done by keeping track of which and how many
cpus have tags on their percpu freelists. On allocation failure if
enough cpus have tags that there could potentially be (nr_tags / 2) tags
stuck on remote percpu freelists, we then pick a remote cpu at random to
steal from.

Note that there's no cpu hotplug notifier - we don't care, because
steal_tags() will eventually get the down cpu's tags. We _could_ satisfy
more allocations if we had a notifier - but we'll still meet our
guarantees and it's absolutely not a correctness issue, so I don't think
it's worth the extra code.

From akpm:

    "It looks OK to me (that's as close as I get to an ack :))

v6 changes:
  - Add #include <linux/cpumask.h> to include/linux/percpu_ida.h to
    make alpha/arc builds happy (Fengguang)
  - Move second (cpu >= nr_cpu_ids) check inside of first check scope
    in steal_tags() (akpm + nab)

v5 changes:
  - Change percpu_ida->cpus_have_tags to cpumask_t (kmo + akpm)
  - Add comment for percpu_ida_cpu->lock + ->nr_free (kmo + akpm)
  - Convert steal_tags() to use cpumask_weight() + cpumask_next() +
    cpumask_first() + cpumask_clear_cpu() (kmo + akpm)
  - Add comment for alloc_global_tags() (kmo + akpm)
  - Convert percpu_ida_alloc() to use cpumask_set_cpu() (kmo + akpm)
  - Convert percpu_ida_free() to use cpumask_set_cpu() (kmo + akpm)
  - Drop percpu_ida->cpus_have_tags allocation in percpu_ida_init()
    (kmo + akpm)
  - Drop percpu_ida->cpus_have_tags kfree in percpu_ida_destroy()
    (kmo + akpm)
  - Add comment for percpu_ida_alloc @ gfp (kmo + akpm)
  - Move to percpu_ida.c + percpu_ida.h (kmo + akpm + nab)

v4 changes:

  - Fix tags.c reference in percpu_ida_init (akpm)

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09 14:29:15 -07:00
Linus Torvalds
89c5a9461d Merge tag 'arc-v3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC changes from Vineet Gupta:

 - ARC MM changes:
    - preparation for MMUv4 (accomodate new PTE bits, new cmds)
    - Rework the ASID allocation algorithm to remove asid-mm reverse map
 - Boilerplate code consolidation in Exception Handlers
 - Disable FRAME_POINTER for ARC
 - Unaligned Access Emulation for Big-Endian from Noam
 - Bunch of fixes (udelay, missing accessors) from Mischa

* tag 'arc-v3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: fix new Section mismatches in build (post __cpuinit cleanup)
  Kconfig.debug: Add FRAME_POINTER anti-dependency for ARC
  ARC: Fix __udelay calculation
  ARC: remove console_verbose() from setup_arch()
  ARC: Add read*_relaxed to asm/io.h
  ARC: Handle un-aligned user space access in BE.
  ARC: [ASID] Track ASID allocation cycles/generations
  ARC: [ASID] activate_mm() == switch_mm()
  ARC: [ASID] get_new_mmu_context() to conditionally allocate new ASID
  ARC: [ASID] Refactor the TLB paranoid debug code
  ARC: [ASID] Remove legacy/unused debug code
  ARC: No need to flush the TLB in early boot
  ARC: MMUv4 preps/3 - Abstract out TLB Insert/Delete
  ARC: MMUv4 preps/2 - Reshuffle PTE bits
  ARC: MMUv4 preps/1 - Fold PTE K/U access flags
  ARC: Code cosmetics (Nothing semantical)
  ARC: Entry Handler tweaks: Optimize away redundant IRQ_DISABLE_SAVE
  ARC: Exception Handlers Code consolidation
  ARC: Add some .gitignore entries
2013-09-09 09:05:33 -07:00