Commit Graph

1087 Commits

Author SHA1 Message Date
Linus Torvalds
4ef58d4e2a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
  tree-wide: fix misspelling of "definition" in comments
  reiserfs: fix misspelling of "journaled"
  doc: Fix a typo in slub.txt.
  inotify: remove superfluous return code check
  hdlc: spelling fix in find_pvc() comment
  doc: fix regulator docs cut-and-pasteism
  mtd: Fix comment in Kconfig
  doc: Fix IRQ chip docs
  tree-wide: fix assorted typos all over the place
  drivers/ata/libata-sff.c: comment spelling fixes
  fix typos/grammos in Documentation/edac.txt
  sysctl: add missing comments
  fs/debugfs/inode.c: fix comment typos
  sgivwfb: Make use of ARRAY_SIZE.
  sky2: fix sky2_link_down copy/paste comment error
  tree-wide: fix typos "couter" -> "counter"
  tree-wide: fix typos "offest" -> "offset"
  fix kerneldoc for set_irq_msi()
  spidev: fix double "of of" in comment
  comment typo fix: sybsystem -> subsystem
  ...
2009-12-09 19:43:33 -08:00
Vivek Goyal
878eaddd05 cfq-iosched: Do not access cfqq after freeing it
Fix a crash during boot reported by Jeff Moyer. Fix the issue of accessing
cfqq after freeing it.

Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@carl.(none)>
2009-12-07 19:37:15 +01:00
Stephen Rothwell
accee7854b block: include linux/err.h to use ERR_PTR
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-07 09:47:07 +01:00
Jens Axboe
bb729bc98c cfq-iosched: use call_rcu() instead of doing grace period stall on queue exit
After the merge of the IO controller patches, booting on my megaraid
box ran much slower. Vivek Goyal traced it down to megaraid discovery
creating tons of devices, each suffering a grace period when they later
kill that queue (if no device is found).

So lets use call_rcu() to batch these deferred frees, instead of taking
the grace period hit for each one.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-06 09:54:19 +01:00
Vivek Goyal
846954b0a3 blkio: Allow CFQ group IO scheduling even when CFQ is a module
o Now issues of blkio controller and CFQ in module mode should be fixed.
  Enable the cfq group scheduling support in module mode.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-04 16:38:14 +01:00
Vivek Goyal
3e25206689 blkio: Implement dynamic io controlling policy registration
o One of the goals of block IO controller is that it should be able to
  support mulitple io control policies, some of which be operational at
  higher level in storage hierarchy.

o To begin with, we had one io controlling policy implemented by CFQ, and
  I hard coded the CFQ functions called by blkio. This created issues when
  CFQ is compiled as module.

o This patch implements a basic dynamic io controlling policy registration
  functionality in blkio. This is similar to elevator functionality where
  ioschedulers register the functions dynamically.

o Now in future, when more IO controlling policies are implemented, these
  can dynakically register with block IO controller.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-04 16:38:14 +01:00
Vivek Goyal
9d6a986c0b blkio: Export some symbols from blkio as its user CFQ can be a module
o blkio controller is inside the kernel and cfq makes use of interfaces
  exported by blkio. CFQ can be a module too, hence export symbols used
  by CFQ.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-04 16:38:14 +01:00
Louis Rilling
b69f229206 block: Fix io_context leak after failure of clone with CLONE_IO
With CLONE_IO, parent's io_context->nr_tasks is incremented, but never
decremented whenever copy_process() fails afterwards, which prevents
exit_io_context() from calling IO schedulers exit functions.

Give a task_struct to exit_io_context(), and call exit_io_context() instead of
put_io_context() in copy_process() cleanup path.

Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-04 16:36:18 +01:00
Louis Rilling
61cc74fbb8 block: Fix io_context leak after clone with CLONE_IO
With CLONE_IO, copy_io() increments both ioc->refcount and ioc->nr_tasks.
However exit_io_context() only decrements ioc->refcount if ioc->nr_tasks
reaches 0.

Always call put_io_context() in exit_io_context().

Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-04 16:36:18 +01:00
André Goddard Rosa
af901ca181 tree-wide: fix assorted typos all over the place
That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-12-04 15:39:55 +01:00
Shaohua Li
3c764b7a65 cfq-iosched: make nonrot check logic consistent
cfq_arm_slice_timer() has logic to disable idle window for SSD device. The same
thing should be done at cfq_select_queue() too, otherwise we will still see
idle window. This makes the nonrot check logic consistent in cfq.
Tests in a intel SSD with low_latency knob close, below patch can triple disk
thoughput for muti-thread sequential read.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-04 13:12:06 +01:00
Jens Axboe
237e5bc4e5 io controller: quick fix for blk-cgroup and modular CFQ
It's currently not an allowed configuration, so express that in Kconfig.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-04 10:07:38 +01:00
Jens Axboe
f2eecb9152 cfq-iosched: move IO controller declerations to a header file
They should not be declared inside some other file that's not related
to CFQ.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-04 10:06:35 +01:00
Jens Axboe
2f5ea47712 cfq-iosched: fix compile problem with !CONFIG_CGROUP
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 21:07:17 +01:00
Vivek Goyal
c04645e592 blkio: Wait on sync-noidle queue even if rq_noidle = 1
o rq_noidle() is supposed to tell cfq that do not expect a request after this
  one, hence don't idle. But this does not seem to work very well. For example
  for direct random readers, rq_noidle = 1 but there is next request coming
  after this. Not idling, leads to a group not getting its share even if
  group_isolation=1.

o The right solution for this issue is to scan the higher layers and set
  right flag (WRITE_SYNC or WRITE_ODIRECT). For the time being, this single
  line fix helps. This should not have any significant impact when we are
  not using cgroups. I will later figure out IO paths in higher layer and
  fix it.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 19:28:53 +01:00
Vivek Goyal
ae30c28655 blkio: Implement group_isolation tunable
o If a group is running only a random reader, then it will not have enough
  traffic to keep disk busy and we will reduce overall throughput. This
  should result in better latencies for random reader though. If we don't
  idle on random reader service tree, then this random reader will experience
  large latencies if there are other groups present in system with sequential
  readers running in these.

o One solution suggested by corrado is that by default keep the random readers
  or sync-noidle workload in root group so that during one dispatch round
  we idle only once on sync-noidle tree. This means that all the sync-idle
  workload queues will be in their respective group and we will see service
  differentiation in those but not on sync-noidle workload.

o Provide a tunable group_isolation. If set, this will make sure that even
  sync-noidle queues go in their respective group and we wait on these. This
  provides stronger isolation between groups but at the expense of throughput
  if group does not have enough traffic to keep the disk busy.

o By default group_isolation = 0

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 19:28:53 +01:00
Vivek Goyal
f26bd1f0a3 blkio: Determine async workload length based on total number of queues
o Async queues are not per group. Instead these are system wide and maintained
  in root group. Hence their workload slice length should be calculated
  based on total number of queues in the system and not just queues in the
  root group.

o As root group's default weight is 1000, make sure to charge async queue
  more in terms of vtime so that it does not get more time on disk because
  root group has higher weight.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 19:28:53 +01:00
Vivek Goyal
f75edf2dc8 blkio: Wait for cfq queue to get backlogged if group is empty
o If a queue consumes its slice and then gets deleted from service tree, its
  associated group will also get deleted from service tree if this was the
  only queue in the group. That will make group loose its share.

o For the queues on which we have idling on and if these have used their
  slice, wait a bit for these queues to get backlogged again and then
  expire these queues so that group does not loose its share.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 19:28:53 +01:00
Vivek Goyal
f8d461d692 blkio: Propagate cgroup weight updation to cfq groups
o Propagate blkio cgroup weight updation to associated cfq groups.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 19:28:53 +01:00
Vivek Goyal
24610333d5 blkio: Drop the reference to queue once the task changes cgroup
o If a task changes cgroup, drop reference to the cfqq associated with io
  context and set cfqq pointer stored in ioc to NULL so that upon next request
  arrival we will allocate a  new queue in new group.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 19:28:52 +01:00
Vivek Goyal
8682e1f15f blkio: Provide some isolation between groups
o Do not allow following three operations across groups for isolation.
	- selection of co-operating queues
	- preemtpions across groups
	- request merging across groups.

o Async queues are currently global and not per group. Allow preemption of
  an async queue if a sync queue in other group gets backlogged.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 19:28:52 +01:00
Vivek Goyal
220841906f blkio: Export disk time and sectors used by a group to user space
o Export disk time and sector used by a group to user space through cgroup
  interface.

o Also export a "dequeue" interface to cgroup which keeps track of how many
  a times a group was deleted from service tree. Helps in debugging.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 19:28:52 +01:00
Vivek Goyal
2868ef7b39 blkio: Some debugging aids for CFQ
o Some debugging aids for CFQ.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 19:28:52 +01:00
Vivek Goyal
b1c3576961 blkio: Take care of cgroup deletion and cfq group reference counting
o One can choose to change elevator or delete a cgroup. Implement group
  reference counting so that both elevator exit and cgroup deletion can
  take place gracefully.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Nauman Rafique <nauman@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 19:28:52 +01:00
Vivek Goyal
25fb5169d4 blkio: Dynamic cfq group creation based on cgroup tasks belongs to
o Determine the cgroup IO submitting task belongs to and create the cfq
  group if it does not exist already.

o Also link cfqq and associated cfq group.

o Currently all async IO is mapped to root group.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 19:28:52 +01:00