Commit Graph

1759 Commits

Author SHA1 Message Date
NeilBrown 9cbb175088 blk: centralize non-request unplug handling.
Both md and umem has similar code for getting notified on an
blk_finish_plug event.
Centralize this code in block/ and allow each driver to
provide its distinctive difference.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-07-31 09:08:14 +02:00
Tejun Heo a051661ca6 blkcg: implement per-blkg request allocation
Currently, request_queue has one request_list to allocate requests
from regardless of blkcg of the IO being issued.  When the unified
request pool is used up, cfq proportional IO limits become meaningless
- whoever grabs the next request being freed wins the race regardless
of the configured weights.

This can be easily demonstrated by creating a blkio cgroup w/ very low
weight, put a program which can issue a lot of random direct IOs there
and running a sequential IO from a different cgroup.  As soon as the
request pool is used up, the sequential IO bandwidth crashes.

This patch implements per-blkg request_list.  Each blkg has its own
request_list and any IO allocates its request from the matching blkg
making blkcgs completely isolated in terms of request allocation.

* Root blkcg uses the request_list embedded in each request_queue,
  which was renamed to @q->root_rl from @q->rq.  While making blkcg rl
  handling a bit harier, this enables avoiding most overhead for root
  blkcg.

* Queue fullness is properly per request_list but bdi isn't blkcg
  aware yet, so congestion state currently just follows the root
  blkcg.  As writeback isn't aware of blkcg yet, this works okay for
  async congestion but readahead may get the wrong signals.  It's
  better than blkcg completely collapsing with shared request_list but
  needs to be improved with future changes.

* After this change, each block cgroup gets a full request pool making
  resource consumption of each cgroup higher.  This makes allowing
  non-root users to create cgroups less desirable; however, note that
  allowing non-root users to directly manage cgroups is already
  severely broken regardless of this patch - each block cgroup
  consumes kernel memory and skews IO weight (IO weights are not
  hierarchical).

v2: queue-sysfs.txt updated and patch description udpated as suggested
    by Vivek.

v3: blk_get_rl() wasn't checking error return from
    blkg_lookup_create() and may cause oops on lookup failure.  Fix it
    by falling back to root_rl on blkg lookup failures.  This problem
    was spotted by Rakesh Iyer <rni@google.com>.

v4: Updated to accomodate 458f27a982 "block: Avoid missed wakeup in
    request waitqueue".  blk_drain_queue() now wakes up waiters on all
    blkg->rl on the target queue.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-26 18:42:49 -04:00
Tejun Heo 5b788ce3e2 block: prepare for multiple request_lists
Request allocation is about to be made per-blkg meaning that there'll
be multiple request lists.

* Make queue full state per request_list.  blk_*queue_full() functions
  are renamed to blk_*rl_full() and takes @rl instead of @q.

* Rename blk_init_free_list() to blk_init_rl() and make it take @rl
  instead of @q.  Also add @gfp_mask parameter.

* Add blk_exit_rl() instead of destroying rl directly from
  blk_release_queue().

* Add request_list->q and make request alloc/free functions -
  blk_free_request(), [__]freed_request(), __get_request() - take @rl
  instead of @q.

This patch doesn't introduce any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-25 11:53:52 +02:00
Tejun Heo 8a5ecdd428 block: add q->nr_rqs[] and move q->rq.elvpriv to q->nr_rqs_elvpriv
Add q->nr_rqs[] which currently behaves the same as q->rq.count[] and
move q->rq.elvpriv to q->nr_rqs_elvpriv.  blk_drain_queue() is updated
to use q->nr_rqs[] instead of q->rq.count[].

These counters separates queue-wide request statistics from the
request list and allow implementation of per-queue request allocation.

While at it, properly indent fields of struct request_list.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-25 11:53:51 +02:00
Tejun Heo b1208b56f3 blkcg: inline bio_blkcg() and friends
Make bio_blkcg() and friends inline.  They all are very simple and
used only in few places.

This patch is to prepare for further updates to request allocation
path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-25 11:53:50 +02:00
Tejun Heo 7f4b35d155 block: allocate io_context upfront
Block layer very lazy allocation of ioc.  It waits until the moment
ioc is absolutely necessary; unfortunately, that time could be inside
queue lock and __get_request() performs unlock - try alloc - retry
dancing.

Just allocate it up-front on entry to block layer.  We're not saving
the rain forest by deferring it to the last possible moment and
complicating things unnecessarily.

This patch is to prepare for further updates to request allocation
path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-25 11:53:50 +02:00
Tejun Heo a06e05e6af block: refactor get_request[_wait]()
Currently, there are two request allocation functions - get_request()
and get_request_wait().  The former tries to allocate a request once
and the latter keeps retrying until it succeeds.  The latter wraps the
former and keeps retrying until allocation succeeds.

The combination of two functions deliver fallible non-wait allocation,
fallible wait allocation and unfailing wait allocation.  However,
given that forward progress is guaranteed, fallible wait allocation
isn't all that useful and in fact nobody uses it.

This patch simplifies the interface as follows.

* get_request() is renamed to __get_request() and is only used by the
  wrapper function.

* get_request_wait() is renamed to get_request().  It now takes
  @gfp_mask and retries iff it contains %__GFP_WAIT.

This patch doesn't introduce any functional change and is to prepare
for further updates to request allocation path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-25 11:53:49 +02:00
Tejun Heo 86072d8112 block: drop custom queue draining used by scsi_transport_{iscsi|fc}
iscsi_remove_host() uses bsg_remove_queue() which implements custom
queue draining.  fc_bsg_remove() open-codes mostly identical logic.

The draining logic isn't correct in that blk_stop_queue() doesn't
prevent new requests from being queued - it just stops processing, so
nothing prevents new requests to be queued after the logic determines
that the queue is drained.

blk_cleanup_queue() now implements proper queue draining and these
custom draining logics aren't necessary.  Drop them and use
bsg_unregister_queue() + blk_cleanup_queue() instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: James Smart <james.smart@emulex.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-25 11:53:48 +02:00
Tejun Heo a91a5ac685 mempool: add @gfp_mask to mempool_create_node()
mempool_create_node() currently assumes %GFP_KERNEL.  Its only user,
blk_init_free_list(), is about to be updated to use other allocation
flags - add @gfp_mask argument to the function.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-25 11:53:47 +02:00
Tejun Heo 159749937a blkcg: make root blkcg allocation use %GFP_KERNEL
Currently, blkcg_activate_policy() depends on %GFP_ATOMIC allocation
from __blkg_lookup_create() for root blkcg creation.  This could make
policy fail unnecessarily.

Make blkg_alloc() take @gfp_mask, __blkg_lookup_create() take an
optional @new_blkg for preallocated blkg, and blkcg_activate_policy()
preload radix tree and preallocate blkg with %GFP_KERNEL before trying
to create the root blkg.

v2: __blkg_lookup_create() was returning %NULL on blkg alloc failure
   instead of ERR_PTR() value.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-25 11:53:46 +02:00
Tejun Heo 13589864be blkcg: __blkg_lookup_create() doesn't need radix preload
There's no point in calling radix_tree_preload() if preloading doesn't
use more permissible GFP mask.  Drop preloading from
__blkg_lookup_create().

While at it, drop sparse locking annotation which no longer applies.

v2: Vivek pointed out the odd preload usage.  Instead of updating,
    just drop it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-25 11:53:45 +02:00
Jan Kara 6d93592807 scsi: Silence unnecessary warnings about ioctl to partition
Sometimes, warnings about ioctls to partition happen often enough that they
form majority of the warnings in the kernel log and users complain. In some
cases warnings are about ioctls such as SG_IO so it's not good to get rid of
the warnings completely as they can ease debugging of userspace problems
when ioctl is refused.

Since I have seen warnings from lots of commands, including some proprietary
userspace applications, I don't think disallowing the ioctls for processes
with CAP_SYS_RAWIO will happen in the near future if ever. So lets just
stop warning for processes with CAP_SYS_RAWIO for which ioctl is allowed.

CC: Paolo Bonzini <pbonzini@redhat.com>
CC: James Bottomley <JBottomley@parallels.com>
CC: linux-scsi@vger.kernel.org
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-15 12:52:46 +02:00
Asias He 76aaa5101f block: Drop dead function blk_abort_queue()
This function was only used by btrfs code in btrfs_abort_devices()
(seems in a wrong way).

It was removed in commit d07eb91170,
So, Let's remove the dead code to avoid any confusion.

Changes in v2: update commit log, btrfs_abort_devices() was removed
already.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-kernel@vger.kernel.org
Cc: Chris Mason <chris.mason@oracle.com>
Cc: linux-btrfs@vger.kernel.org
Cc: David Sterba <dave@jikos.cz>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-15 08:46:23 +02:00
Asias He 5e5cfac0c6 block: Mitigate lock unbalance caused by lock switching
Commit 777eb1bf15 disconnects externally
supplied queue_lock before blk_drain_queue(). Switching the lock would
introduce lock unbalance because theads which have taken the external
lock might unlock the internal lock in the during the queue drain. This
patch mitigate this by disconnecting the lock after the queue draining
since queue draining makes a lot of request_queue users go away.

However, please note, this patch only makes the problem less likely to
happen. Anyone who still holds a ref might try to issue a new request on
a dead queue after the blk_cleanup_queue() finishes draining, the lock
unbalance might still happen in this case.

 =====================================
 [ BUG: bad unlock balance detected! ]
 3.4.0+ #288 Not tainted
 -------------------------------------
 fio/17706 is trying to release lock (&(&q->__queue_lock)->rlock) at:
 [<ffffffff81329372>] blk_queue_bio+0x2a2/0x380
 but there are no more locks to release!

 other info that might help us debug this:
 1 lock held by fio/17706:
  #0:  (&(&vblk->lock)->rlock){......}, at: [<ffffffff81327f1a>]
 get_request_wait+0x19a/0x250

 stack backtrace:
 Pid: 17706, comm: fio Not tainted 3.4.0+ #288
 Call Trace:
  [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
  [<ffffffff810dea49>] print_unlock_inbalance_bug+0xf9/0x100
  [<ffffffff810dfe4f>] lock_release_non_nested+0x1df/0x330
  [<ffffffff811dae24>] ? dio_bio_end_aio+0x34/0xc0
  [<ffffffff811d6935>] ? bio_check_pages_dirty+0x85/0xe0
  [<ffffffff811daea1>] ? dio_bio_end_aio+0xb1/0xc0
  [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
  [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
  [<ffffffff810e0079>] lock_release+0xd9/0x250
  [<ffffffff81a74553>] _raw_spin_unlock_irq+0x23/0x40
  [<ffffffff81329372>] blk_queue_bio+0x2a2/0x380
  [<ffffffff81328faa>] generic_make_request+0xca/0x100
  [<ffffffff81329056>] submit_bio+0x76/0xf0
  [<ffffffff8115470c>] ? set_page_dirty_lock+0x3c/0x60
  [<ffffffff811d69e1>] ? bio_set_pages_dirty+0x51/0x70
  [<ffffffff811dd1a8>] do_blockdev_direct_IO+0xbf8/0xee0
  [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
  [<ffffffff811dd4e5>] __blockdev_direct_IO+0x55/0x60
  [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
  [<ffffffff811d92e7>] blkdev_direct_IO+0x57/0x60
  [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
  [<ffffffff8114c6ae>] generic_file_aio_read+0x70e/0x760
  [<ffffffff810df7c5>] ? __lock_acquire+0x215/0x5a0
  [<ffffffff811e9924>] ? aio_run_iocb+0x54/0x1a0
  [<ffffffff8114bfa0>] ? grab_cache_page_nowait+0xc0/0xc0
  [<ffffffff811e82cc>] aio_rw_vect_retry+0x7c/0x1e0
  [<ffffffff811e8250>] ? aio_fsync+0x30/0x30
  [<ffffffff811e9936>] aio_run_iocb+0x66/0x1a0
  [<ffffffff811ea9b0>] do_io_submit+0x6f0/0xb80
  [<ffffffff8134de2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff811eae50>] sys_io_submit+0x10/0x20
  [<ffffffff81a7c9e9>] system_call_fastpath+0x16/0x1b

Changes since v2: Update commit log to explain how the code is still
                  broken even if we delay the lock switching after the drain.
Changes since v1: Update commit log as Tejun suggested.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-15 08:46:22 +02:00
Asias He 458f27a982 block: Avoid missed wakeup in request waitqueue
After hot-unplug a stressed disk, I found that rl->wait[] is not empty
while rl->count[] is empty and there are theads still sleeping on
get_request after the queue cleanup. With simple debug code, I found
there are exactly nr_sleep - nr_wakeup of theads in D state. So there
are missed wakeup.

  $ dmesg | grep nr_sleep
  [   52.917115] ---> nr_sleep=1046, nr_wakeup=873, delta=173
  $ vmstat 1
  1 173  0 712640  24292  96172 0 0  0  0  419  757  0  0  0 100  0

To quote Tejun:

  Ah, okay, freed_request() wakes up single waiter with the assumption
  that after the wakeup there will at least be one successful allocation
  which in turn will continue the wakeup chain until the wait list is
  empty - ie. waiter wakeup is dependent on successful request
  allocation happening after each wakeup.  With queue marked dead, any
  woken up waiter fails the allocation path, so the wakeup chaining is
  lost and we're left with hung waiters. What we need is wake_up_all()
  after drain completion.

This patch fixes the missed wakeup by waking up all the theads which
are sleeping on wait queue after queue drain.

Changes in v2: Drop waitqueue_active() optimization

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Asias He <asias@redhat.com>

Fixed a bug by me, where stacked devices would oops on calling
blk_drain_queue() since ->rq.wait[] do not get initialized unless
it's a full queue setup.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-15 08:45:25 +02:00
Tejun Heo 27e1f9d1cc blkcg: drop local variable @q from blkg_destroy()
blkg_destroy() caches @blkg->q in local variable @q.  While there are
two places which needs @blkg->q, only lockdep_assert_held() used the
local variable leading to unused local variable warning if lockdep is
configured out.  Drop the local variable and just use @blkg->q
directly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Rakesh Iyer <rni@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-06 08:35:31 +02:00
Tejun Heo 9b2ea86bc9 blkcg: fix blkg_alloc() failure path
When policy data allocation fails in the middle, blkg_alloc() invokes
blkg_free() to destroy the half constructed blkg.  This ends up
calling pd_exit_fn() on policy datas which didn't go through
pd_init_fn().  Fix it by making blkg_alloc() call pd_init_fn()
immediately after each policy data allocation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-04 10:03:21 +02:00
Tejun Heo ffea73fc72 block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED
cfq may be built w/ or w/o blkcg support depending on
CONFIG_CFQ_CGROUP_IOSCHED.  If blkcg support is disabled, most of
related code is ifdef'd out but some part is left dangling -
blkcg_policy_cfq is left zero-filled and blkcg_policy_[un]register()
calls are made on it.

Feeding zero filled policy to blkcg_policy_register() is incorrect and
triggers the following WARN_ON() if CONFIG_BLK_CGROUP &&
!CONFIG_CFQ_GROUP_IOSCHED.

 ------------[ cut here ]------------
 WARNING: at block/blk-cgroup.c:867
 Modules linked in:
 Modules linked in:
 CPU: 3 Not tainted 3.4.0-09547-gfb21aff #1
 Process swapper/0 (pid: 1, task: 000000003ff80000, ksp: 000000003ff7f8b8)
 Krnl PSW : 0704100180000000 00000000003d76ca (blkcg_policy_register+0xca/0xe0)
	    R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3
 Krnl GPRS: 0000000000000000 00000000014b85ec 00000000014b85b0 0000000000000000
	    000000000096fb60 0000000000000000 00000000009a8e78 0000000000000048
	    000000000099c070 0000000000b6f000 0000000000000000 000000000099c0b8
	    00000000014b85b0 0000000000667580 000000003ff7fd98 000000003ff7fd70
 Krnl Code: 00000000003d76be: a7280001           lhi     %r2,1
	    00000000003d76c2: a7f4ffdf           brc     15,3d7680
	   #00000000003d76c6: a7f40001           brc     15,3d76c8
	   >00000000003d76ca: a7c8ffea           lhi     %r12,-22
	    00000000003d76ce: a7f4ffce           brc     15,3d766a
	    00000000003d76d2: a7f40001           brc     15,3d76d4
	    00000000003d76d6: a7c80000           lhi     %r12,0
	    00000000003d76da: a7f4ffc2           brc     15,3d765e
 Call Trace:
 ([<0000000000b6f000>] initcall_debug+0x0/0x4)
  [<0000000000989e8a>] cfq_init+0x62/0xd4
  [<00000000001000ba>] do_one_initcall+0x3a/0x170
  [<000000000096fb60>] kernel_init+0x214/0x2bc
  [<0000000000623202>] kernel_thread_starter+0x6/0xc
  [<00000000006231fc>] kernel_thread_starter+0x0/0xc
 no locks held by swapper/0/1.
 Last Breaking-Event-Address:
  [<00000000003d76c6>] blkcg_policy_register+0xc6/0xe0
 ---[ end trace b8ef4903fcbf9dd3 ]---

This patch fixes the problem by ensuring all blkcg support code is
inside CONFIG_CFQ_GROUP_IOSCHED.

* blkcg_policy_cfq declaration and blkg_to_cfqg() definition are moved
  inside the first CONFIG_CFQ_GROUP_IOSCHED block.  __maybe_unused is
  dropped from blkcg_policy_cfq decl.

* blkcg_deactivate_poilcy() invocation is moved inside ifdef.  This
  also makes the activation logic match cfq_init_queue().

* All blkcg_policy_[un]register() invocations are moved inside ifdef.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <20120601112954.GC3535@osiris.boeblingen.de.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-04 10:02:29 +02:00
Tejun Heo fd7949564c block: fix return value on cfq_init() failure
cfq_init() would return zero after kmem cache creation failure.  Fix
so that it returns -ENOMEM.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-04 10:01:38 +02:00
Eric Dumazet 3c9c708c9f block: avoid infinite loop in get_task_io_context()
Calling get_task_io_context() on a exiting task which isn't %current can
loop forever. This triggers at boot time on my dev machine.

BUG: soft lockup - CPU#3 stuck for 22s ! [mountall.1603]

Fix this by making create_task_io_context() returns -EBUSY in this case
to break the loop.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-05-31 13:39:05 +02:00
Linus Torvalds 0d167518e0 Merge branch 'for-3.5/core' of git://git.kernel.dk/linux-block
Merge block/IO core bits from Jens Axboe:
 "This is a bit bigger on the core side than usual, but that is purely
  because we decided to hold off on parts of Tejun's submission on 3.4
  to give it a bit more time to simmer.  As a consequence, it's seen a
  long cycle in for-next.

  It contains:

   - Bug fix from Dan, wrong locking type.
   - Relax splice gifting restriction from Eric.
   - A ton of updates from Tejun, primarily for blkcg.  This improves
     the code a lot, making the API nicer and cleaner, and also includes
     fixes for how we handle and tie policies and re-activate on
     switches.  The changes also include generic bug fixes.
   - A simple fix from Vivek, along with a fix for doing proper delayed
     allocation of the blkcg stats."

Fix up annoying conflict just due to different merge resolution in
Documentation/feature-removal-schedule.txt

* 'for-3.5/core' of git://git.kernel.dk/linux-block: (92 commits)
  blkcg: tg_stats_alloc_lock is an irq lock
  vmsplice: relax alignement requirements for SPLICE_F_GIFT
  blkcg: use radix tree to index blkgs from blkcg
  blkcg: fix blkcg->css ref leak in __blkg_lookup_create()
  block: fix elvpriv allocation failure handling
  block: collapse blk_alloc_request() into get_request()
  blkcg: collapse blkcg_policy_ops into blkcg_policy
  blkcg: embed struct blkg_policy_data in policy specific data
  blkcg: mass rename of blkcg API
  blkcg: style cleanups for blk-cgroup.h
  blkcg: remove blkio_group->path[]
  blkcg: blkg_rwstat_read() was missing inline
  blkcg: shoot down blkgs if all policies are deactivated
  blkcg: drop stuff unused after per-queue policy activation update
  blkcg: implement per-queue policy activation
  blkcg: add request_queue->root_blkg
  blkcg: make request_queue bypassing on allocation
  blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing
  blkcg: make blkg_conf_prep() take @pol and return with queue lock held
  blkcg: remove static policy ID enums
  ...
2012-05-30 08:52:42 -07:00
Tejun Heo ff26eaadf4 blkcg: tg_stats_alloc_lock is an irq lock
tg_stats_alloc_lock nests inside queue lock and should always be held
with irq disabled.  throtl_pd_{init|exit}() were using non-irqsafe
spinlock ops which triggered inverse lock ordering via irq warning via
RCU freeing of blkg invoking throtl_pd_exit() w/o disabling IRQ.

Update both functions to use irq safe operations.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
LKML-Reference: <1335339396.16988.80.camel@lappy>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-05-23 12:16:21 +02:00
Linus Torvalds 88d6ae8dc3 Merge branch 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "cgroup file type addition / removal is updated so that file types are
  added and removed instead of individual files so that dynamic file
  type addition / removal can be implemented by cgroup and used by
  controllers.  blkio controller changes which will come through block
  tree are dependent on this.  Other changes include res_counter cleanup
  and disallowing kthread / PF_THREAD_BOUND threads to be attached to
  non-root cgroups.

  There's a reported bug with the file type addition / removal handling
  which can lead to oops on cgroup umount.  The issue is being looked
  into.  It shouldn't cause problems for most setups and isn't a
  security concern."

Fix up trivial conflict in Documentation/feature-removal-schedule.txt

* 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
  res_counter: Account max_usage when calling res_counter_charge_nofail()
  res_counter: Merge res_counter_charge and res_counter_charge_nofail
  cgroups: disallow attaching kthreadd or PF_THREAD_BOUND threads
  cgroup: remove cgroup_subsys->populate()
  cgroup: get rid of populate for memcg
  cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg
  cgroup: make css->refcnt clearing on cgroup removal optional
  cgroup: use negative bias on css->refcnt to block css_tryget()
  cgroup: implement cgroup_rm_cftypes()
  cgroup: introduce struct cfent
  cgroup: relocate __d_cgrp() and __d_cft()
  cgroup: remove cgroup_add_file[s]()
  cgroup: convert memcg controller to the new cftype interface
  memcg: always create memsw files if CONFIG_CGROUP_MEM_RES_CTLR_SWAP
  cgroup: convert all non-memcg controllers to the new cftype interface
  cgroup: relocate cftype and cgroup_subsys definitions in controllers
  cgroup: merge cft_release_agent cftype array into the base files array
  cgroup: implement cgroup_add_cftypes() and friends
  cgroup: build list of all cgroups under a given cgroupfs_root
  cgroup: move cgroup_clear_directory() call out of cgroup_populate_dir()
  ...
2012-05-22 17:40:19 -07:00
Linus Torvalds e60b9a0346 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "Just a random collection of bug-fixes and cleanups, nothing new in
  this merge request."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (46 commits)
  s390/ap: Fix wrong or missing comments
  s390/ap: move receive callback to message struct
  s390/dasd: re-prioritize partition detection message
  s390/qeth: reshuffle initialization
  s390/qeth: cleanup drv attr usage
  s390/claw: cleanup drv attr usage
  s390/lcs: cleanup drv attr usage
  s390/ctc: cleanup drv attr usage
  s390/ccwgroup: remove ccwgroup_create_from_string
  s390/qeth: stop using struct ccwgroup driver for discipline callbacks
  s390/qeth: switch to ccwgroup_create_dev
  s390/claw: switch to ccwgroup_create_dev
  s390/lcs: switch to ccwgroup_create_dev
  s390/ctcm: switch to ccwgroup_create_dev
  s390/ccwgroup: exploit ccwdev_by_dev_id
  s390/ccwgroup: introduce ccwgroup_create_dev
  s390: fix race on TIF_MCCK_PENDING
  s390/barrier: make use of fast-bcr facility
  s390/barrier: cleanup barrier functions
  s390/claw: remove "eieio" calls
  ...
2012-05-21 12:41:17 -07:00
Stefan Haberland 505e5ecfd3 s390/dasd: re-prioritize partition detection message
To avoid confusion while formatting a DASD device change the level of
the "Expected VOL1 label not found" message from warning to info.

Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16 14:42:51 +02:00